Found Description
Job description
Purpose
This role is responsible for maintaining the service-level agreement critical production platforms or products and providing automated operations to ensure the service to our clients is always of the best quality.
Responsibilities
- Harden platforms before and after go-live by reviewing architecture, security, configurations, and implementing monitoring.
- Ensure reliability by monitoring availability, performance, and overall system health across infrastructure and applications.
- Lead incident response and recovery to meet SLAs, leveraging strong infrastructure expertise to restore services quickly.
- Conduct root cause analysis and post-mortems to drive continuous reliability improvements.
- Collaborate with development and product teams to enhance scalability, resilience, and operational readiness.
- Validate release readiness through CAB participation, automated testing,...