Position: Data Engineer (CE60SF RM 3570)
Shift timing : General Shift
Relevant Experience required : 4+ years
Education Required : Bachelor’s / Masters / PhD : B.E Computers, MCA is preferable
Must have skills:
- Azure Data Factory and Databricks (PySpark/SQL)
- Python (pandas/PySpark)
- SQL
- CI/CD (Azure DevOps/GitHub Actions/Jenkins).
- exposure to drift detection and pipeline observability
- Cloud platform [Azure preferred: ADLS Gen2, Key Vault, Databricks, ADF basics]
- Data Governance & Security
Good to have:
- Databricks Asset Bundles (DAB)
- Power BI exposure
- Streaming/real-time: Kafka/Event Hubs; CDC tools (e.g., Debezium, ADF/Synapse CDC).
- MLOps touchpoints: MLflow tracking/registry, feature tables, basic model-inference pipelines.
Role Summary
Build and operate scalable, reliable data pipelines on Azure. Develop batch and streaming ingestion, transform data using Databricks (PySpark/SQL), enforce data quality, and publish curated datasets for analytics and ML.
Key Responsibilities
- Design, build, and maintain ETL/ELT pipelines in Azure Data Factory and Databricks across Bronze → Silver → Gold layers.
- Implement Delta Lake best practices (ACID, schema evolution, MERGE/upsert, time travel, Z-ORDER).
- Write performant PySpark and SQL; tune jobs (partitioning, caching, join strategies).
- Create reusable components; manage code in Git; contribute to CI/CD pipelines (Azure DevOps/GitHub Actions/Jenkins).
- Apply data quality checks (Great Expectations or custom validations), monitoring, drift detection, and alerting.
- Model data for analytics (star/dimensional); publish to Synapse/Snowflake/SQL Server.
- Uphold governance and security (Purview/Unity Catalog lineage, RBAC, tagging, encryption, PII handling).
- Author documentation/runbooks; support production incidents and root-cause analysis; suggest
cost/performance improvements.
Must-Have (Mandatory)
- Data Engineering & Pipelines
o Hands-on experience building production pipelines with Azure Data Factory and Databricks (PySpark/SQL).
o Working knowledge of Medallion Architecture and Delta Lake (schema evolution, ACID). - Programming & Automation
o Strong Python (pandas/PySpark) and SQL.
o Practical Git workflow; experience integrating pipelines into CI/CD (Azure DevOps/GitHub Actions/Jenkins).
o Familiarity with packaging reusable code (e.g., Python wheels) and configuration-driven jobs. - Data Modelling & Warehousing
o Solid grasp of dimensional modelling/star schemas; experience with Synapse, Snowflake, or SQL Server. - Data Quality & Monitoring
o Implemented validation checks and alerts; exposure to drift detection and pipeline observability. - Cloud Platforms (Azure preferred)
o ADLS Gen2, Key Vault, Databricks, ADF basics (linked services, datasets, triggers), environment promotion. - Data Governance & Security
o Experience with metadata/lineage (Purview/Unity Catalog), RBAC, secrets management, and secure data sharing.
o Understanding of PII/PHI handling and encryption at rest/in transit. - Collaboration
o Clear communication, documentation discipline, Agile ways of working, and code reviews.
Good-to-Have
o Databricks Asset Bundles (DAB) for environment promotion/infra-as-code style deployments.
o Streaming/real-time: Kafka/Event Hubs; CDC tools (e.g., Debezium, ADF/Synapse CDC).
o MLOps touchpoints: MLflow tracking/registry, feature tables, basic model-inference pipelines.
o Power BI exposure for publishing curated tables and building operational KPIs.
o DataOps practices: automated testing, data contracts, lineage-aware deployments, cost optimization on Azure.
o Certifications: Microsoft Certified — Azure Data Engineer Associate (DP-203) or equivalent.
Qualifications
- 4–6 years of professional experience in data engineering (or equivalent project depth).
- Bachelor’s/Master’s in CS/IT/Engineering or related field (or equivalent practical experience).
*******************************************************************************************************************************************
Apply for this position
Mention correct information below. Mention skills aligned with the job description you are applying for. This would help us process your application seamlessly.