Senior Machine Learning Ops Engineer

McKesson•Irving, TX

About The Position

As a Senior Machine Learning Ops Engineer at McKesson, you will be instrumental in building, deploying, and maintaining robust and scalable machine learning systems. You will bridge the gap between data science and operations, ensuring our AI/ML models are seamlessly integrated into production environments, monitored effectively, and continuously optimized to deliver maximum business value.

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related quantitative field.
5+ years of experience in software engineering, DevOps, or MLOps roles, with a strong focus on machine learning systems.
Proficiency in at least one major programming language (e.g., Python, Java, Scala) with extensive experience in Python for ML workflows.
Hands-on experience with MLOps platforms and tools (e.g., MLflow, Kubeflow, Sagemaker, Azure ML, Google AI Platform).
Strong understanding of machine learning concepts, algorithms, and model lifecycle management.
Experience with cloud platforms (Azure, AWS, or GCP) and their ML-related services.
Proficiency with containerization technologies (Docker) and orchestration tools (Kubernetes).
Solid understanding of CI/CD principles and experience with tools like Jenkins, GitLab CI, Azure DevOps, etc.
Experience with data pipeline tools and technologies (e.g., Apache Spark, Kafka, Airflow).
Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Excellent problem-solving skills, with a focus on building scalable and resilient systems.
Strong communication and collaboration skills, with the ability to work effectively across cross-functional teams.
Degree or equivalent and typically requires 7+ years of relevant experience.

Nice To Haves

Azure experience is a plus.

Responsibilities

Design, develop, and implement end-to-end MLOps pipelines for the deployment, monitoring, and management of machine learning models in production.
Collaborate closely with data scientists to understand model requirements, optimize model performance for production, and ensure efficient model handoffs.
Build and maintain automated CI/CD pipelines for ML models, enabling rapid iteration and reliable deployment.
Implement robust monitoring, logging, and alerting systems for ML models, tracking performance, data drift, and model decay.
Develop and manage scalable infrastructure for ML model training and inference, leveraging cloud platforms (e.g., Azure, AWS, GCP).
Ensure the security, reliability, and compliance of ML systems, adhering to industry best practices and McKesson's internal standards.
Containerize ML applications and services using Docker and orchestrate deployments with Kubernetes.
Evaluate and integrate new MLOps tools and technologies to improve efficiency and capabilities.
Provide technical leadership and mentorship to junior engineers, fostering best practices in MLOps.
Troubleshoot and resolve complex issues related to ML model deployment, performance, and infrastructure.