Loading...

Principal Site Reliability Engineer AI Platform Architecture

  • Full Time
  • Anywhere

Link Group

Link Group — Kraków, małopolskie

Key Responsibilities: Defining the reliability architecture for AI compute services, including SLO frameworks, fault tolerance patterns, and advanced capacity planning models. Driving hands-on development of automation and tooling that scales the SRE team's impact and eliminates operational toil. Designing a comprehensive observability strategy, leveraging existing platforms to build specialized t…

View full listing & apply (via Adzuna)

To apply for this job please visit www.adzuna.com.

Keep exploring on Get A Job.ai

Not quite the right fit? Your next opportunity is a click away.

Hiring instead? Post a job and reach candidates searching right now.