Highlight is seeking an Azure Databricks Engineer to remotely support a cloud development program supporting our federal customer.
Design, build, and maintain scalable ETL/ELT pipelines using Databricks workflows, Delta Lake, and Apache Spark to process large volumes of structured and unstructured data
Architect and optimize data lakehouse solutions, implement data modeling best practices, and ensure efficient data storage and retrieval using Delta Lake format and partitioning strategies
Configure and manage Databricks workspaces, clusters, and compute resources while implementing security controls, access management, and cost optimization strategies
Develop streaming data solutions using Structured Streaming and implement batch processing jobs for data transformation, aggregation, and integration across multiple data sources
Collaborate with data scientists to deploy machine learning models, implement MLflow for model lifecycle management, and create automated ML pipelines for training and inference
Monitor data pipeline performance, troubleshoot issues, implement data quality checks, and optimize Spark jobs for improved efficiency and reduced processing costs
BA/BS/MS in Computer Science, Engineering, Data Science, or equivalent experience
5+ years experience as a Data Engineer, with 3+ years specifically in Azure Databricks and related Azure data services
Expertise in Apache Spark, Delta Lake, Python, SQL, and Scala
Hands-on experience designing and optimizing large-scale data pipelines
Experience with Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, and other Azure services
Strong understanding of ETL/ELT processes, data modeling, and big data technologies
Microsoft Certified: Azure Data Engineer Associate (DP-203) or Databricks Certified Data Engineer or equivalent certification or experience
Experience with Agile methodology and implementing data governance and security