Found Description
Responsibilities
- Define and maintain SLIs, SLOs, and monitor alignment and error‑budget usage.
- Lead incident response and post‑mortems, and implement corrective measures.
- Automate operations tasks via tooling such as auto‑remediation and scaling rules.
- Build, improve, and maintain CI/CD pipelines, canary deployments, and blue/green strategies.
- Lead technical discussions with customers to align on reliability, scalability, and performance requirements.
- Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes.
- Implement and extend observability systems: metrics, tracing, and log aggregation.
- Optimize performance and cost by tuning cloud services, autoscaling, and resource rightsizing.
- Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments.
- Collaborate wi...