O

Site Reliability Engineer

Otyms Consultings Services Inc.

mexico, mexico, Mexico Full-time June 17, 2026

Found Description

This role is responsible for building, operating, and scaling highly reliable AI/ML and cloud infrastructure platforms. The position combines Site Reliability Engineering (SRE), Platform Engineering, and AI Operations (AIOps) to ensure production systems remain stable, automated, and scalable.


Key Responsibilities

  • Build and scale agentic AI systems for incident triage, anomaly detection, and self-healing automation.
  • Maintain and improve the reliability and performance of AI/ML model-serving infrastructure.
  • Operate, optimize, and scale distributed cloud-native systems.
  • Drive automation initiatives to reduce manual operational work and improve efficiency.
  • Define and manage SLOs, monitoring, observability, and incident response processes.
  • Participate in troubleshooting, root-cause analysis, and continuous system improvement.


Required Skills & Experience

Ready to Apply?

Submit your application for Site Reliability Engineer at Otyms Consultings Services Inc.

Apply Now