Found Description
Key Responsibilities
- SRE team supporting mission‑critical UPS.com applications (24/7 on‑call rotation required).
- Design and operate highly available, scalable cloud infrastructure in GCP and Azure.
- Drive SRE best practices including SLOs, SLIs, error budgets, and incident management.
- Build and evolve internal developer platforms to enable self‑service and accelerate delivery.
- Manage and optimize Kubernetes environments (GKE/OpenShift), including operators and service mesh.
- Implement Infrastructure as Code using Terraform and Config Connector.
- Develop CI/CD pipelines and GitOps workflows using Argo CD and Azure Pipelines.
- Enhance observability through monitoring, logging, and tracing (Prometheus, Grafana, Dynatrace).
- Automate operational workflows using AI, Python, shell scripting, and Ansible.
- Implement security best practices including secrets management (Vault) and policy e...
Ready to Apply?
Submit your application for Site Reliability Engineer at Creative Solutions Services, LLC
Apply Now