Pyspark Data Engineer with Databricks

CapgeminiNew York, NY
5d$90,000 - $110,000

About The Position

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.Job Location : New York, NYJob DescriptionWe are looking for a hands-on mid–senior level PySpark Data Engineer with Databricks who can design, build, and own production-grade data pipelines and platform components. This role requires strong expertise in Python/PySpark, Databricks, and Snowflake, with a focus on building scalable, cost‑efficient, and reliable data systems that support both analytics and machine learning use cases.

Requirements

  • 8+ years of experience in data engineering with strong hands‑on work in PySpark and Python.
  • Deep experience with Databricks, Spark optimization, cluster tuning, and performance troubleshooting.
  • Strong experience working with Snowflake or similar cloud data warehouses.
  • Practical knowledge of workflow orchestration tools and dependency management.
  • Solid understanding of data modeling, ingestion frameworks, and distributed systems architecture.
  • Hands‑on experience implementing CI/CD for data and ML pipelines.
  • Strong experience with MLflow for managing the ML lifecycle.
  • Excellent communication skills with the ability to work across engineering and business teams.

Responsibilities

  • Design, develop, and maintain end‑to‑end ETL/ELT pipelines using Python and PySpark on Databricks.
  • Optimize Spark jobs for performance, scalability, and cost-efficiency in production environments.
  • Implement data quality frameworks including validation, reconciliation, and anomaly detection.
  • Build and manage orchestration workflows (Airflow / Databricks Workflows / equivalent).
  • Implement pipeline monitoring, logging, alerting, and observability for reliable operations.
  • Develop and operationalize ML workflows using MLflow (experiment tracking, model registry, packaging, deployment).
  • Build scalable data ingestion and data modeling solutions for analytics and ML use cases.
  • Collaborate with data scientists, platform teams, engineering stakeholders, and business partners.

Benefits

  • Paid time off based on employee grade (A-F), defined by policy: Vacation: 12-25 days, depending on grade, Company paid holidays, Personal Days, Sick Leave
  • Medical, dental, and vision coverage (or provincial healthcare coordination in Canada)
  • Retirement savings plans (e.g., 401(k) in the U.S., RRSP in Canada)
  • Life and disability insurance
  • Employee assistance programs
  • Other benefits as provided by local policy and eligibility
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service