Found Description
Job Overview
We are looking for a DevOps Engineer to help maintain production Kubernetes-based systems for a major technology company that specializes in infrastructure supporting AI research. This position brings together site reliability engineering, observability and SQL production support duties, with a clear focus on monitoring, metrics, dashboards and operational excellence. The right candidate will partner with established engineering and research teams to uphold system reliability, resolve production issues and steadily strengthen visibility into system health and performance across an Azure Stack environment.
Responsibilities
- Design, maintain and progressively improve observability solutions, including dashboards and visual reports built with Grafana or comparable monitoring tools
- Set up, implement and oversee metrics, SLIs, SLOs and alerting approaches to guarantee reliability and transparency across production systems
- Del...