Machine Learning Operations Engineer

Augment Professional ServicesHouston, TX
1d$65 - $82Hybrid

About The Position

The MLOps Engineer is responsible for designing, deploying, and maintaining scalable machine learning solutions in production across multi-cloud and data platform environments. This role plays a critical part in operationalizing machine learning models by building robust pipelines, enabling automation, and ensuring reliability, performance, and governance across AWS, Microsoft Azure, and Snowflake ecosystems. Working closely with data scientists, data engineers, and cloud platform teams, the MLOps Engineer bridges the gap between model development and production deployment. This position focuses on creating secure, scalable, and cost-efficient ML platforms that support end-to-end lifecycle management, including model training, deployment, monitoring, and continuous improvement. The ideal candidate brings strong experience in cloud-native architectures, CI/CD automation, and production-grade ML systems, with hands-on expertise in AWS, Azure, and Snowflake environments.

Requirements

  • Minimum of 5+ years of experience in MLOps, machine learning engineering, platform engineering, or DevOps
  • Hands-on experience with AWS, Microsoft Azure, and Snowflake in building or supporting production ML/data platforms
  • Strong programming skills in Python and SQL
  • Experience deploying and managing machine learning models in production environments
  • Experience with cloud ML services such as AWS SageMaker and Azure Machine Learning
  • Experience building and integrating data pipelines with Snowflake
  • Proficiency with CI/CD pipelines, infrastructure automation, and model versioning
  • Experience with containerization and orchestration tools such as Docker and Kubernetes
  • Experience with workflow orchestration tools such as Apache Airflow, Azure Data Factory, or similar
  • Familiarity with monitoring, logging, alerting, and observability frameworks
  • Strong understanding of data engineering concepts, APIs, and distributed systems
  • Proven troubleshooting, communication, and cross-functional collaboration skills

Nice To Haves

  • Master’s or PhD in Computer Science, Computer Engineering, or a related technical field
  • Experience with Snowflake Cortex AI, Snowpark, or machine learning workloads within Snowflake
  • Experience with generative AI platforms such as AWS Bedrock or Azure OpenAI
  • Experience building real-time inference systems, event-driven architectures, or serverless pipelines
  • Familiarity with feature stores, vector databases, and retrieval-augmented generation (RAG) systems
  • Experience with infrastructure-as-code tools such as Terraform, AWS CloudFormation, or Azure Resource Manager
  • Understanding of security, compliance, and governance frameworks in regulated environments
  • Experience implementing A/B testing, shadow deployments, and advanced model release strategies

Responsibilities

  • Design and implement end-to-end machine learning pipelines including data ingestion, feature engineering, model training, validation, deployment, and monitoring
  • Deploy and manage machine learning models in production across AWS, Azure, and Snowflake platforms
  • Build and maintain batch and real-time inference pipelines using cloud-native and platform-native services
  • Develop and automate CI/CD pipelines for model packaging, testing, deployment, and rollback
  • Integrate ML workflows with services such as AWS SageMaker, AWS Lambda, Azure Machine Learning, Azure Data Factory, and Snowflake
  • Build and manage orchestration workflows using tools such as Apache Airflow, Azure Data Factory, or similar platforms
  • Implement model lifecycle management practices including experiment tracking, model registry, and governance frameworks
  • Monitor model performance, including accuracy, drift, latency, throughput, and pipeline reliability
  • Establish and manage deployment strategies such as canary releases, blue-green deployments, shadow testing, and rollback mechanisms
  • Collaborate cross-functionally to transition machine learning models from research to production environments
  • Ensure security, compliance, traceability, and access controls across data and ML systems
  • Optimize performance, scalability, and cost efficiency across cloud and data platforms
  • Document architecture designs, deployment standards, and operational procedures
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service