MLOps Engineer

Stanford Health Care•Newark, NJ

1d•$79 - $105

About The Position

If you're ready to be part of our legacy of hope and innovation, we encourage you to take the first step and explore our current job openings. Your best is waiting to be discovered. Day - 08 Hour (United States of America) We are seeking a high-caliber Senior AI Platform & ML Ops Engineer to architect the "layered" infrastructure required for autonomous, agentic systems within Stanford Healthcare. In this role, you will be the "Master Chef" of our AI ecosystem, seamlessly folding Expert-Level DevOps (Kubernetes, Terraform, DevOps orchestration) with Agentic Application Development (LangGraph, CrewAI, Tool-calling logic). You won't just manage servers; you will build the robust, full-stack "factory" where multi-agent frameworks interact with healthcare APIs, ensuring every autonomous action is governed by strict ML Ops observability (LangSmith, Arize) and safety guardrails. If you have the "crispy" coding skills to build RAG pipelines in Python and the "rich" architectural depth to deploy scalable microservices, extensive full stack software development expertise, we want you to lead the integration of reasoning-based AI into the future of clinical and business workflow automations. This is a Stanford Health Care job. A Brief Overview The MLOPs Engineer will play an integral role incorporating Artificial Intelligence (AI) within Stanford Health Care. The solutions will impact patient care, medical research, and operational services. This group is tasked to innovate, build, deploy and monitor production grade AI, machine learning (ML) and predictive algorithms into healthcare. The role will partner closely with lead researchers within the AI field and leaders across various clinical specialties and operations. This role will report to the Infrastructure group and have a dotted line relationship to the Data Science team. The role will be responsible for maintaining cloud-based infrastructure as code repositories, maintaining infrastructure, deployment pipelines and designing the security landscape for the team and objects. The role will set the standards for the full SDLC of projects for the Data Science team. Locations Stanford Health Care

Requirements

Proven experience as an MLOps Engineer.
Strong knowledge of cloud platforms such as AWS, Azure or Google Cloud and experience with infrastructure-as-code tools like Terraform or CloudFormation.
Proficiency in containerization technologies such as Docker and container orchestration platforms like Kubernetes.
Experience with CI/CD tools such as GitLab CI/CD, Github Actions or CiricleCI.
Solid programming skills in languages such as Python, Rust or Go and experience in scripting and automation.
Familiarity with machine learning frameworks and libraries such as PyTorch, Tensorflow and scikit-learn.
Deep understanding of DevOps principles, agile methodologies and software development lifecycle.
Strong problem-solving and trouble shooting skills, with the ability to analyze and resolve complex technical issues.
Excellent communication and collaboration skills with the ability to work effectively in cross-functional teams.

Responsibilities

Design, build and maintain scalable and robust infrastructure for AI/ML systems, including cloud-based environments, containerization and orchestration platforms.
Develop and implement CI/CD pipelines to automate the deployment, testing and monitoring of AI/ML models and applications.
Collaborate with data scientists, data engineers and software engineers to optimize model training, deployment and inference pipelines.
Monitor and troubleshoot AI/ML systems to ensure high availability, performance and reliability.
Maintain and monitor model training and inference pipelines across multi-cloud tenants especially around Large Language Models (LLMs).
Maintain Kubernetes pods, container registry and virtual machine image library and model registry
Monitor infrastructure utilization and costs pertaining to model training, inference and GPU utilization
Implement best practices for security, data privacy and compliance in AI/ML workflows and infrastructure.
Evaluate and integrate new tools, technologies and frameworks to improve the efficiency and effectiveness of our MLOps processes.
Mentor and provide technical guidance to junior members of the organization.
Stay up-to-date with the latest advancements and trends in MLOps, DevOps and cloud technologies and share them with the team.