O

Site Reliability Engineer

Otyms Consultings Services Inc.

méxico, méxico, Mexico Full-time June 18, 2026

Found Description

This role is responsible for building, operating, and scaling highly reliable AI/ML and cloud infrastructure platforms.
The position combines Site Reliability Engineering (SRE), Platform Engineering, and AI Operations (AIOps) to ensure production systems remain stable, automated, and scalable.Key ResponsibilitiesBuild and scale agentic AI systems for incident triage, anomaly detection, and self-healing automation.Maintain and improve the reliability and performance of AI/ML model-serving infrastructure.Operate, optimize, and scale distributed cloud-native systems.Drive automation initiatives to reduce manual operational work and improve efficiency.Define and manage SLOs, monitoring, observability, and incident response processes.Participate in troubleshooting, root-cause analysis, and continuous system improvement.Required Skills & Experience5+ years of experience in SRE, Production Engineering, or Platform Engineering.Hands-on expertise with cloud platforms such as AWS, GCP, or...

Ready to Apply?

Submit your application for Site Reliability Engineer at Otyms Consultings Services Inc.

Apply Now