Found Description
As part of the Site Reliability Engineering (SRE) team, you’ll contribute to designing, automating, and evolving mission-critical systems. You'll combine deep systems expertise with modern software engineering practices to reduce operational toil and build resilient, self-healing services.
This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.
**What You’ll Do**:
- Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.
- Design, write, and deploy software and automation tools that enhance availability, observability, and scalability.
- Own and evolve metrics, SLOs, SLAs, KPIs, and dashboards that track system health and customer experience.
- Build tooling to reduce manual operations and eliminate sources of toil.
- Improve CI/CD pipelines, deployment processes, and validation frameworks for reliability ...
This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.
**What You’ll Do**:
- Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.
- Design, write, and deploy software and automation tools that enhance availability, observability, and scalability.
- Own and evolve metrics, SLOs, SLAs, KPIs, and dashboards that track system health and customer experience.
- Build tooling to reduce manual operations and eliminate sources of toil.
- Improve CI/CD pipelines, deployment processes, and validation frameworks for reliability ...