Found Description
We are looking for a highly experienced DevOps / Site Reliability Engineer (SRE) to support and operate mission-critical production systems across hybrid environments. The ideal candidate will have strong expertise in incident management, CI/CD, Kubernetes operations, and cloud infrastructure (AWS/Azure) . You will play a key role in ensuring system reliability, deployment stability, and rapid incident resolution , working closely with engineering and support teams. Key Responsibilities Production Operations & Incident Response (Primary) Support 24x7 production systems for services and integrations Participate in on-call rotation (primarily weekdays) Troubleshoot incidents across: CI/CD pipelines Kubernetes clusters API Gateway Networking and applications Perform incident triage, mitigation, and recovery Ensure safe deployments with rollback mechanisms Technical Skills (Mandatory) Kubernetes Operations (deployment, troubleshooting, scaling) CI/CD Tools : GitHub Actions, Azure DevOps, O...