Found Description
Role Introduction
Builds and operates end-to-end data pipelines, ingestion, transformation, and orchestration across the lakehouse stack, deploying through CI/CD onto containerized infrastructure.
Features
- Onsite
Requirements
- Design and build batch and streaming ingestion pipelines using Airflow, Kafka, and Spark.
- Develop transformation logic and ETL/ELT workflows using Informatica IDMC alongside custom Spark jobs where needed.
- Containerize pipeline code and deploy via GitLab CI/CD onto Dockerized/Kubernetes infrastructure.
- Write and maintain Airflow DAGs with proper dependency management, retries, and SLA monitoring.
- Implement data quality checks and validation logic at each stage of the pipeline.
- Optimize Spark jobs for performance and cost (partitioning, caching, shuffle management) writing into Iceberg tables.
- Collaborate with the Data Modeler and Data A...