Databricks Data Engineer

McKessonIrving, TX
22h

About The Position

McKesson is an impact-driven, Fortune 10 company that touches virtually every aspect of healthcare. We are known for delivering insights, products, and services that make quality care more accessible and affordable. Here, we focus on the health, happiness, and well-being of you and those we serve – we care. What you do at McKesson matters. We foster a culture where you can grow, make an impact, and are empowered to bring new ideas. Together, we thrive as we shape the future of health for patients, our communities, and our people. If you want to be part of tomorrow’s health today, we want to hear from you. This role is responsible for designing and operating reliable, scalable data workflows on the Databricks platform, with a strong focus on data process monitoring, job optimization, and data quality. Candidates should have advanced Databricks expertise and solid practical understanding of enterprise database systems (Oracle, PostgreSQL, MongoDB).

Requirements

  • Degree or equivalent and typically requires 4+ years of relevant experience.
  • 4+ years of hands-on experience with Databricks and Apache Spark in a cloud or enterprise setting.
  • Experience with data process monitoring tools, alerting automation, and dashboarding inside Databricks.
  • Advanced knowledge of Databricks jobs, job monitoring, error handling, and performance metrics tools.
  • Good understanding of database fundamentals, including SQL, table design, indexing, and troubleshooting (Oracle, PostgreSQL, MongoDB).
  • Experience building, documenting, and supporting reliable production-grade data workflows.
  • Proficient in Python and SQL for data engineering and automating monitoring/reporting tasks.
  • Bachelor’s degree in computer science, Information Systems, Engineering, or related discipline.
  • Candidates must be authorized to work in USA.
  • Sponsorship is not available for this role.

Nice To Haves

  • Understanding of Delta Lake and lakehouse.
  • Experience with Azure cloud environment security, job orchestration, and production support best practices.

Responsibilities

  • Build, optimize, and maintain batch and streaming data pipelines using Databricks, Apache Spark, and Delta Lake for cloud analytics workloads.
  • Monitor, troubleshoot, and report on the status and health of data pipelines and processing jobs using Databricks-native tools, logs, and dashboards to ensure timely and reliable data delivery.
  • Analyze and resolve job failures, resource bottlenecks, and data quality issues, escalating problems as needed and providing root-cause analysis.
  • Apply strong SQL and data modeling knowledge (from Oracle, PostgreSQL, MongoDB) when creating, transforming, and validating large data sets to support a variety of business and analytics use cases.
  • Implement and enforce data security controls, encryption, and access policies within Databricks, following industry best practices and healthcare compliance requirements.
  • Work with data governance, compliance, and IT security teams to continuously evaluate and improve system security, privacy and regulatory alignment.
  • Document pipeline architecture, monitoring processes, and standard operating procedures for the data engineering team and other stakeholders.
  • Collaborate with business intelligence, analytics, and data operations teams to deliver high-quality data with consistent performance and availability.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service