Bestkaam Logo
SourcingXPress Logo

Spark Data Engineer

Mumbai, Maharashtra, India

4 weeks ago

Applicants: 0

Salary Not Disclosed

N/A

Job Description

Company: Mactores Website: Visit Website Business Type: Startup Company Type: Service Business Model: B2B Funding Stage: Pre-seed Industry: Data Analytics Job Description Mactores is a trusted leader among businesses in providing modern data platform solutions. Since 2008, Mactores have been enabling businesses to accelerate their value through automation by providing End-to-End Data Solutions that are automated, agile, and secure. We collaborate with customers to strategize, navigate, and accelerate an ideal path forward with a digital transformation via assessments, migration, or modernization. We are seeking a highly skilled and innovative Spark Engineer to join our team. In this role, you will design, develop, optimize, and operationalize high-performance data pipelines and applications using Apache Spark. This role requires hands-on expertise in distributed data processing, ETL engineering, performance tuning, cluster management, and working with cross-functional teams to deliver reliable, scalable, and efficient data solutions What Will You Do Architect, design, and build scalable data pipelines and distributed applications using Apache Spark (Spark SQL, DataFrames, RDDs) Develop and manage ETL/ELT pipelines to process structured and unstructured data at scale. Write high-performance code in Scala or PySpark for distributed data processing workloads. Optimize Spark jobs by tuning shuffle, caching, partitioning, memory, executor cores, and cluster resource allocation. Monitor and troubleshoot Spark job failures, cluster performance, bottlenecks, and degraded workloads. Debug production issues using logs, metrics, and execution plans to maintain SLA-driven pipeline reliability. Deploy and manage Spark applications on on-prem or cloud platforms (AWS, Azure, or GCP). Collaborate with data scientists, analysts, and engineers to design data models and enable self-serve analytics. Implement best practices around data quality, data reliability, security, and observability. Support cluster provisioning, configuration, and workload optimization on platforms like Kubernetes, YARN, or EMR/Databricks. Maintain version-controlled codebases, CI/CD pipelines, and deployment automation. Document architecture, data flows, pipelines, and runbooks for operational excellence What We Are Looking For Bachelor?s degree in Computer Science, Engineering, or a related field. 4+ years of experience building distributed data processing pipelines, with deep expertise in Apache Spark. Strong understanding of Spark internals (Catalyst optimizer, DAG scheduling, shuffle, partitioning, caching). Proficiency in Scala and/or PySpark with strong software engineering fundamentals. Solid expertise in ETL/ELT, distributed computing, and large-scale data processing. Experience with cluster and job orchestration frameworks. Strong ability to identify and resolve performance bottlenecks and production issues. Familiarity with data security, governance, and data quality frameworks. Excellent communication and collaboration skills to work with distributed engineering teams. Ability to work independently and deliver scalable solutions in a fast-paced environment You Will Be Preferred If Experience with Databricks, AWS EMR, Glue Spark, or GCP Dataproc. Familiarity with workflow orchestration tools like Apache Airflow, Dagster, or Prefect. Exposure to streaming platforms such as Kafka, Kinesis, or Pub/Sub. Experience running Spark workloads on Kubernetes. Familiarity with data warehouse ecosystems (Snowflake, BigQuery, Redshift, Iceberg, Delta Lake, Hudi). Understanding of DevOps practices, CI/CD, and IaC (Terraform, CloudFormation). Knowledge of distributed logging and monitoring tools (Grafana, Prometheus, CloudWatch, ELK). Prior experience in high-scale production environments or data platform teams

Additional Information

Company Name
SourcingXPress
Industry
N/A
Department
N/A
Role Category
N/A
Job Role
Mid-Senior level
Education
No Restriction
Job Types
Hybrid
Employment Types
Full-Time
Gender
No Restriction
Notice Period
Less Than 30 Days
Year of Experience
1 - Any Yrs
Job Posted On
4 weeks ago
Application Ends
N/A

Similar Jobs

BairesDev

3 weeks ago

Staff Python Engineer (Apache Ecosystem)

BairesDev

Screen Andragogy Platforms (SAP)

4 weeks ago

Junior Growth & AI Analytics Specialist

Screen Andragogy Platforms (SAP)

Accenture services Pvt Ltd

4 weeks ago

Custom Software Engineer

Accenture services Pvt Ltd

Quess Corp Limited

2 months ago

Artificial Intelligence Engineer/ML

Quess Corp Limited

Qpaix Infitech Private Limited

2 months ago

Solution Architecture

Qpaix Infitech Private Limited

Accenture services Pvt Ltd

4 weeks ago

Custom Software Engineer

Accenture services Pvt Ltd

Ray Business Technologies (A CMMI Level 3 Company)

3 weeks ago

Senior Data Scientist

Ray Business Technologies (A CMMI Level 3 Company)

Barclays

6 days ago

Public Cloud Support Engineer

Barclays

Mastercard

2 months ago

Senior Software Engineer-2

Mastercard

Oracle

2 months ago

Software Developer 3

Oracle