Principal AI / Machine Learning Data Engineer - Remote or hybrid from MN or DC

UnitedHealth GroupEden Prairie, MN
$112,700 - $193,200Hybrid

About The Position

The Principal AI Data Engineer will design and build end-to-end AI pipelines for large-scale unstructured data, enabling advanced analytics, Generative AI, and investigative insights. This role will transform raw, complex datasets—such as scanned documents, images, PRFs and other OCR- driven unstructured data sources—into AI-ready, searchable, and model-integrated data products. You will play a key role in building LLM-powered systems (e.g., RAG, semantic search, summarization, and insight extraction) and scaling them into production environments. This position sits at the intersection of data engineering and AI, with an emphasis on building modern data pipelines and enabling production-grade AI capabilities. You’ll enjoy the flexibility to work remotely from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office a minimum of four days per week.

Requirements

  • Bachelor’s degree or equivalent experience
  • 5+ years of experience designing, building, and operating scalable data pipelines and platforms (batch + streaming)
  • 2+ years of experience deploying Generative AI solutions to production (e.g., RAG, LLM-powered pipelines, semantic search)
  • Proven solid hands-on development in Python and SQL, with experience in Spark/PySpark and Databricks (or similar distributed platforms)
  • Experience building ingestion and processing frameworks for unstructured data (OCR, documents, images), including parsing and enrichment
  • Experience with cloud platforms (AWS/Azure/GCP), DevOps/CI/CD, and infrastructure-as-code, including secure handling of sensitive data (PII/PHI)
  • Proven ability to design scalable solutions, implement data quality/observability practices, and collaborate across stakeholders
  • Solid hands-on engineering in Python and SQL; familiarity with JVM languages (Java/Scala) in Spark ecosystems
  • Familiarity with security and privacy concepts for data platforms (e.g., least privilege, PII/PHI handling) and working with compliance partners

Nice To Haves

  • Experience with cloud platforms such as AWS, Azure, or Google Cloud, including managed data services
  • Experience with streaming and event-driven architectures (e.g., Kafka, Kinesis, Event Hubs)
  • Experience with data quality and validation frameworks (e.g., Great Expectations, Deequ) and/or data observability tooling
  • Experience enabling MLOps practices (e.g., feature stores, model registries, experiment tracking, deployment automation)
  • Experience with lakehouse architectures, Delta Lake, and advanced Spark optimization/performance tuning
  • Experience with data visualization tools and libraries such as Plotly, seaborn, and Chartjs
  • Experience with machine learning and predictive analytics

Responsibilities

  • Design, develop, and maintain scalable data pipelines and data platforms supporting analytics, machine learning, and AI use cases
  • Build and optimize ingestion frameworks for large-scale structured and unstructured data, including streaming and event-driven sources
  • Partner with cross-functional stakeholders to understand evolving data and AI needs and define long-term technical solutions
  • Enable and support machine learning and AI workflows, including feature engineering, data preparation, and model deployment support
  • Drive strategic initiatives around Generative AI, data quality, observability, lineage, and governance
  • Develop and maintain frameworks that support rapid experimentation and deployment of AI/ML solutions
  • Introduce and evolve best practices in data modeling, orchestration, testing, and monitoring
  • Identify and champion opportunities for platform scalability, performance optimization, and cost efficiency
  • Collaborate with product, analytics, and infrastructure teams to deliver high-impact data and AI solutions
  • Build and maintain reusable parsing, enrichment, analytic, and service libraries to accelerate delivery across teams
  • Work comfortably under time-sensitive conditions while ensuring thoroughness
  • Maintain high ethical standards and the ability to remain objective and confidential
  • Build and operate production data platforms and pipelines across batch and streaming workloads
  • Work hands-on engineering in Python and SQL; in a JVM languages (Java/Scala) Spark ecosystems
  • Utilize distributed processing and lakehouse/warehouse patterns (eg, Spark/PySpark, Databricks, Snowflake)
  • Build pipelines for OCR, document parsing, and text extraction from image-based or scanned data sources
  • Enable Generative AI solutions in production (eg, RAG-style architectures), including retrieval patterns and evaluation/monitoring practices
  • Take a knowledge-centric data approaches (eg, metadata-driven systems, entity resolution, and/or graph concepts) to improve discoverability and downstream analytics
  • Implement data quality, observability, and monitoring (profiling, validation, alerting, and reliability improvements)
  • Utilize orchestration, CI/CD, containerization, and infrastructure-as-code (eg, Airflow, GitHub Actions, Docker, Terraform, Kubernetes)
  • Work in the Cloud (AWS, Azure, and/or GCP), including secure handling of sensitive data (PII/PHI) and collaboration with compliance partners
  • Lead through influence, mentor engineers, and translate ambiguous problems into scalable technical roadmaps

Benefits

  • comprehensive benefits package
  • incentive and recognition programs
  • equity stock purchase
  • 401k contribution
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service