Found Description
Your role
As a Site Reliability Engineer (SRE), you will bridge the gap between development and operations, ensuring our systems are reliable, scalable, and performing optimally. You'll work in a dynamic L4 Support environment where your expertise in automation, monitoring, and incident response will be crucial for maintaining service excellence.
Responsibilities
- Designing, implementing, and maintaining infrastructure automation using Python/Bash scripting and infrastructure‑as‑code tools.
- Managing and optimizing Kubernetes clusters and containerized applications in Linux environments.
- Creating and enhancing monitoring systems to ensure high availability and performance of critical services.
- Developing automated solutions for incident response, capacity planning, and system recovery.
- Collaborating with development teams to improve application reliability and scalability.
- Participating in on‑call rotations ...
Ready to Apply?
Submit your application for DevOps Engineer (Colombia) at Capgemini Engineering
Apply Now