Found Description
We are seeking aLead Site Reliability Engineerwith substantial expertise in enhancing the reliability, availability, performance and scalability of production environments.
The right candidate will bring a strong software engineering mindset paired with deep operational knowledge, cloud expertise, automation capabilities and practical incident management experience.This position centers on engineering dependable systems, minimizing operational toil, strengthening observability and supporting engineering teams in delivering services that align with established reliability targets.Architect and deliver solutions that enhance system reliability, availability and performance Establish and track SLIs, SLOs and error budgets Develop automation that eliminates manual operational effort and recurring tasks Enhance monitoring, logging, tracing and alerting capabilities Engage in incident response, root cause investigation and postmortems Partner with development teams to strengthen servi...
Ready to Apply?
Submit your application for Lead Site Reliability Engineer at Epam Systems
Apply Now