Principal MLOps Engineer

SOLVENTUMPittsburgh, PA
$142,800 - $196,350Remote

About The Position

Solventum is a new healthcare company with a long legacy of creating breakthrough solutions for our customers’ toughest challenges. We pioneer game-changing innovations at the intersection of health, material and data science that change patients' lives for the better while enabling healthcare professionals to perform at their best. As a Principal MLOps Engineer, you will lead the operational architecture, deployment strategy, and reliability engineering for integrating AI into high-stakes Healthcare Information Systems (HIS). You will define the enterprise operational standards, govern the release processes, and build the resilient infrastructure required to maintain models in mission-critical clinical environments. You are the definitive authority on production discipline, compliance support, and incident resolution for the AI organization.

Requirements

  • Bachelor's Degree or Higher in Computer Science, Software Engineering, or related technical field.
  • 10+ years of experience in software engineering, with at least 6 years dedicated to deploying and maintaining large-scale ML systems in production (not just research or POCs).
  • Expert-level experience with Cloud Providers (AWS/GCP/Azure) and orchestration tools (Kubernetes, Kubeflow, or Airflow).
  • Expert-level Python and Java/Go (or similar).
  • Deep proficiency in backend frameworks, microservices, and system design patterns.
  • Expert knowledge of monitoring stacks (Prometheus, Grafana, Datadog) and establishing enterprise SLAs/SLOs for AI services.
  • Proven track record of designing automated deployment pipelines, managing complex rollback procedures, and enforcing model registry governance at scale.
  • Must be legally authorized to work in a country of employment without sponsorship for employment visa status (e.g., H1B status).

Nice To Haves

  • Master’s or PhD in Computer Science, Software Engineering, or related technical field is preferred.
  • Deep understanding of cybersecurity best practices and ATO processes within regulated industries (Healthcare, Finance, or Defense).
  • Proven ability to design systems that handle massive concurrency and distributed data processing.

Responsibilities

  • Architect and govern the comprehensive release process, defining enterprise checklists, automated approval gates, release notes, and deployment readiness standards.
  • Establish the deployment execution standards for promoting AI across all environments and ensure customer deployments adhere to strict internal production discipline.
  • Architect and oversee the enterprise model registry, ensuring seamless integration with CI/CD pipelines and full version control traceability.
  • Define and enforce monitoring standards, establishing critical SLAs/SLOs, service health metrics, and comprehensive dashboards across the AI ecosystem.
  • Architect automated checks for input/output data quality and model drift, ensuring proactive detection of system degradation.
  • Establish and lead the production incident process, including rigorous triage workflows, severity escalation paths, postmortems, rollback mechanisms, and recovery infrastructure.
  • Partner with Platform teams to provide essential ATO (Authority to Operate) and compliance support, ensuring complete deployment traceability and strict operational controls.
  • Oversee comprehensive operational reporting, providing leadership with status updates across production systems, pre-prod testing, customer rollouts, and incident metrics.
  • Foster a culture of production discipline, guiding junior engineers in maintaining operational runbooks and reliable deployment pipelines.

Benefits

  • Medical, Dental & Vision
  • Health Savings Accounts
  • Health Care & Dependent Care Flexible Spending Accounts
  • Disability Benefits
  • Life Insurance
  • Voluntary Benefits
  • Paid Absences
  • Retirement Benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service