Found Description
Job Main Responsibilities
- Monitor data pipelines and platform health
- Manage incidents and operational issues across the data platform
- Perform root-cause analysis and implement corrective actions
- Support reliability, performance, and cost optimization
- Collaborate with engineering teams to improve platform robustness
Technical Skills & Technology Landscape
- Cloud data platform operations
- Pipeline monitoring, alerting, and observability
- Incident management and operational support
- Performance and cost optimization (FinOps mindset)
Qualifications And Skills
- Core Technical
- Platform observability: Azure Monitor, Databricks monitoring, pipeline alerting (PagerDuty, OpsGenie)
- Incident management: triage, escalation, RCA documentation, post-mortems
- Infrastructure-as-Code: Terraform, Bicep, ARM templates for Azure ...