Data Engineer

Definitive Healthcare, USFramingham, MA
2h$69,000 - $129,000

About The Position

At Definitive Healthcare (NASDAQ: DH), we’re passionate about turning data, analytics, and expertise into meaningful intelligence that helps our customers achieve success and shape the future of healthcare. We empower them to uncover the right markets, opportunities, and people—paving the way for smarter decisions and greater impact. Headquartered just outside of Boston, Massachusetts, Definitive Healthcare operates across North America, Europe, and India, supporting a growing global client base of more than 2,400 customers since our founding in 2011. We’re also a great place to work. In 2024 and 2025, we earned multiple workplace honors, including Built In’s 100 Best Places to Work in Boston (both years), a Stevie Bronze Award for Great Employers, and recognition as a Great Place to Work in India. We foster a collaborative, inclusive culture where diverse perspectives drive innovation. Through programs like DefinitiveCares and our employee-led affinity groups we strive to promote connection, education, and inclusion. We are looking for a Data Engineer who is passionate about building scalable data pipelines, working with complex healthcare datasets, and contributing to a modern, cloud‑native data architecture. If you thrive in a fast‑paced, data‑driven environment and have strong experience with Python, Spark, Databricks, AWS, SQL, and related technologies, we’d love to hear from you.

Requirements

  • Technical Skills:
  • Strong programming experience in SQL and Python or Scala
  • Hands‑on experience with Apache Spark and Databricks
  • Experience with Apache Airflow or similar orchestration tools
  • Knowledge of data cleansing, curation, and quality frameworks
  • Familiarity with Unity Catalog or other metadata management tools
  • Understanding of data governance, security, and compliance best practices
  • Experience working with AWS cloud services
  • Proficiency with CI/CD tools (Jenkins, GitLab CI, etc.)
  • Experience tuning Spark jobs and JVM‑based applications
  • Experience implementing or working within a Medallion architecture
  • Soft Skills:
  • Strong analytical and problem‑solving abilities
  • Excellent communication and cross‑functional collaboration skills
  • Ability to work independently and within a team environment
  • High attention to detail and commitment to quality

Nice To Haves

  • AWS certifications (e.g., AWS Certified Data Analytics)
  • Experience with SQL and NoSQL databases
  • Background in a fast‑paced, data‑centric SaaS or healthcare environment

Responsibilities

  • Design and Develop Data Pipelines:
  • Develop and maintain robust data pipelines using Python, Spark, Databricks, SQL, and SSIS
  • Implement and orchestrate ETL/ELT workflows using Apache Airflow and SSIS
  • Build reliable, repeatable processes that support the ingestion and transformation of large healthcare datasets
  • Data Integration and Management:
  • Integrate data from diverse sources (AWS, on‑prem, third‑party vendors) into our enterprise data platform
  • Work with a wide range of file formats including CSV, XML, Parquet, Delta, and more
  • Apply strong data quality, cleansing, and curation practices to ensure accuracy and consistency
  • Optimize storage and compute resources for performance, cost, and scalability
  • Automate observability and monitoring across data pipelines and workloads
  • Metadata Management and Governance:
  • Implement and manage Unity Catalog for metadata, lineage, and access control
  • Ensure adherence to data governance, security, and privacy standards
  • Maintain clear documentation, data dictionaries, and lineage tracking
  • Contribute to automation of data observability and governance workflows
  • Performance Tuning and Troubleshooting:
  • Tune and optimize Spark jobs for speed, reliability, and cost efficiency
  • Diagnose and resolve performance bottlenecks across distributed systems
  • Apply JVM tuning and Spark optimization techniques to improve throughput
  • Data Maturity Lifecycle:
  • Support and enhance our Medallion architecture (bronze/silver/gold) to improve data quality and usability
  • Ensure data is processed, enriched, and validated at each stage of the lifecycle
  • Collaboration and Continuous Improvement:
  • Partner with data scientists, analysts, product teams, and business stakeholders to understand data needs
  • Implement CI/CD pipelines to streamline deployment and testing of data assets
  • Stay current with emerging technologies and bring forward recommendations to evolve our data platform

Benefits

  • Depending on the position, employees may also be eligible to participate in a company bonus or commission plan.
  • All employees are eligible for a comprehensive benefits package, including medical, dental, and vision coverage, unlimited paid time off, and participation in the company’s 401(k) plan with employer contribution.
  • Industry leading products
  • Work hard, and have fun doing it
  • Incredibly fast growth means limitless opportunity
  • Flexible and dynamic culture
  • Work alongside some of the most talented and dedicated teammates
  • Definitive Cares, our community service group, gives all of us a chance to give back
  • Competitive benefits package including great healthcare benefits and a 401(k) match
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service