Found Description
What success looks like in this role:
End-to-End Pipeline Engineering: Build and automate robust ETL/ELT pipelines using Azure Data Factory (ADF), AWS Glue, and Apache Airflow.
· Distributed Computing: Develop large-scale data processing jobs using PySpark and Scala within Databricks or EMR environments.
· Streaming & Real-time Integration: Design and implement real-time data ingestion and processing layers using Apache Kafka, Confluent, or AWS Kinesis.
· Data Lakehouse : Manage and optimize cloud storage using ADLS Gen2 and S3, implementing ACID transactions with Delta Lake or Apache Iceberg.
· Advanced Data Modeling: Design highly performant schemas for cloud data warehouses like Snowflake, Amazon Redshift, or Google BigQuery.
· Data Transformation & Quality: Use dbt (data build tool) for modeling and implement automated quality checks using Great Expectations or Soda.
· Infrastructure & CI/CD: Deploy and manage data infrastructure...