Data Engineer (5+ Years Experience)

Mumbai, Maharashtra, India

4 weeks ago

Applicants: 0

Apply Now

Data Engineering Snowflake Docker Python SQL

Salary Not Disclosed

56 minutes left to apply

Job Description

** We are considering immediate joiners only. Interested candidates may send their resumes to [email protected] , including details of their CTC, ECTC, notice period, and a short overview of their relevant ML and Data Engineering work.** About the Role We are seeking an experienced Data Engineer (5+ years) with strong, hands-on experience in ML/AI data workflows. You will play a critical role in building feature pipelines, model-serving data flows, and end-to-end orchestration that powers our production ML systems. This is a fully remote, ownership-driven position. Key Responsibilities ML / AI Data Engineering Build and maintain feature engineering pipelines for ML model training and inference. Develop and optimize model-serving data pipelines ensuring low-latency and reliable delivery. Design and orchestrate end-to-end ML workflows (Airflow, Prefect, Dagster, Kubeflow, etc.). Work closely with Data Scientists and ML Engineers to productionize ML models. Implement automated dataset versioning, feature stores, and reproducibility frameworks. Build scalable data foundations required for MLOps: monitoring, retraining triggers, model data validation. Data Pipelines & ETL Design and build high-performance ETL/ELT pipelines for structured and unstructured data. Manage ingestion from APIs, databases, files, event streams, and cloud storage. Ensure pipelines are fault-tolerant, well-monitored, and automated. Data Modelling & Data Warehousing Build and maintain data models, marts, and warehouse layers to support analytics and ML pipelines. Translate ML feature requirements into clean and optimized data structures. Data Quality & Governance Implement schema validation, data quality checks, and automated monitoring. Maintain metadata, lineage, and documentation for all data flows. Cloud & Infrastructure Develop cloud-native data workflows (AWS / Azure / GCP). Work with data storage and compute systems like S3, BigQuery, Snowflake, Databricks, Redshift, etc. Ensure performance optimization, scaling, and cost-efficiency. DevOps, CI/CD & Automation Build CI/CD pipelines for data and ML workflows. Containerize pipelines using Docker, and manage deployments via Git-based workflows. Automate scheduling, builds, and monitoring for data and ML systems. Required Skills & Experience 5+ years of experience as a Data Engineer. Major Requirement (Non-Negotiable): Strong experience working on ML/AI projects, including: ML feature pipelines Model-serving data workflows ML orchestration (Airflow, Prefect, Dagster, Kubeflow, etc.) Strong in Python, SQL, and ETL frameworks. Experience with big data technologies (Spark, PySpark, Databricks). Hands-on with cloud platforms (AWS/Azure/GCP). Experience with CI/CD, Docker, Git, APIs. Ability to work independently and in cross-functional remote teams. Excellent communication and documentation skills. Nice-to-Have Skills Tools: MLflow, Vertex AI, SageMaker, Azure ML Streaming: Kafka, Kinesis, Pub/Sub Data quality frameworks: Great Expectations, Soda, Pandera Why Join Us Fully remote with flexible schedule Work on real-world ML/AI production systems High ownership + direct architectural influence Opportunity to collaborate with advanced Data Science & ML teams