Data Engineer III

Agile Defense•Doral, FL

About The Position

Build Scalable Data & ML Infrastructure Design and implement medallion architecture (Bronze/Silver/Gold) using Databricks or reliable data processing and ML model training Develop automated data pipelines that process structured and unstructured data from multiple sources into analytics-ready formats Create robust ETL/ELT workflows using Apache Spark and modern data engineering practices for both batch and streaming data Build and maintain data quality monitoring and validation systems across the entire data and ML lifecycle Drive ML Platform Excellence Implement MLOps best practices including automated model training, validation, deployment, and monitoring using MLflow and Databricks workflows Design scalable ML inference systems that handle high-volume, low-latency predictions in production environments Create comprehensive monitoring and alerting systems for model performance, data drift, and system health Build self-service ML capabilities that enable data scientists to deploy and monitor their own models efficiently Enable Advanced Analytics & Business Intelligence Design and maintain data models that support both machine learning workloads and business intelligence requirements Create integration points between ML systems and business intelligence platforms (Tableau, PowerBI, Qlik Sense) Implement data governance standards and metadata management systems that ensure data quality and compliance Collaborate with analysts and data scientists to optimize data architecture for both predictive modeling and reporting needs Ensure Data Quality & Governance Implement comprehensive data governance frameworks including data lineage tracking, quality monitoring, and compliance controls Design and maintain data catalogs and metadata management systems that enable efficient data discovery across the organization Establish data quality standards and automated testing frameworks for both analytical and ML workloads Work with stakeholders to define data definitions, business logic, and governance policies Integrate with Enterprise Systems Build integrations with MAVEN Smart Systems (Palantir Foundry) environments to support operational and predictive analytics Connect Databricks-based systems with enterprise data warehouses, streaming platforms, and business applications Implement security and compliance controls that meet enterprise requirements while enabling self-service capabilities Collaborate with platform engineers to integrate ML systems with broader application architecture and infrastructure

Requirements

5+ years of technical experience, including 3+ years building production data pipelines and ML infrastructure using distributed computing platforms like Databricks.
Strong data engineering skills in Python, PySpark, and Spark SQL with experience implementing medallion architecture and modern data platform patterns
Production ML systems experience including model deployment, monitoring, and MLOps practices in cloud environments
Data architecture expertise with experience designing scalable data processing systems and implementing data governance frameworks
Experience integrating with platforms such as Qlik, Tableau, PowerBI, MAVEN Smart System (Palantir), or similar.

Responsibilities

Design and implement medallion architecture (Bronze/Silver/Gold) using Databricks or reliable data processing and ML model training
Develop automated data pipelines that process structured and unstructured data from multiple sources into analytics-ready formats
Create robust ETL/ELT workflows using Apache Spark and modern data engineering practices for both batch and streaming data
Build and maintain data quality monitoring and validation systems across the entire data and ML lifecycle
Implement MLOps best practices including automated model training, validation, deployment, and monitoring using MLflow and Databricks workflows
Design scalable ML inference systems that handle high-volume, low-latency predictions in production environments
Create comprehensive monitoring and alerting systems for model performance, data drift, and system health
Build self-service ML capabilities that enable data scientists to deploy and monitor their own models efficiently
Design and maintain data models that support both machine learning workloads and business intelligence requirements
Create integration points between ML systems and business intelligence platforms (Tableau, PowerBI, Qlik Sense)
Implement data governance standards and metadata management systems that ensure data quality and compliance
Collaborate with analysts and data scientists to optimize data architecture for both predictive modeling and reporting needs
Implement comprehensive data governance frameworks including data lineage tracking, quality monitoring, and compliance controls
Design and maintain data catalogs and metadata management systems that enable efficient data discovery across the organization
Establish data quality standards and automated testing frameworks for both analytical and ML workloads
Work with stakeholders to define data definitions, business logic, and governance policies
Build integrations with MAVEN Smart Systems (Palantir Foundry) environments to support operational and predictive analytics
Connect Databricks-based systems with enterprise data warehouses, streaming platforms, and business applications
Implement security and compliance controls that meet enterprise requirements while enabling self-service capabilities
Collaborate with platform engineers to integrate ML systems with broader application architecture and infrastructure