MLOps Engineer II

ICW GroupSan Diego, CA
$105,780 - $189,348Hybrid

About The Position

Are you looking to make an impactful difference in your work, yourself, and your community? Why settle for just a job when you can land a career? At ICW Group, we are hiring team members who are ready to use their skills, curiosity, and drive to be part of our journey as we strive to transform the insurance carrier space. We're proud to be in business for over 50 years, and its change agents like yourself that will help us continue to deliver our mission to create the best insurance experience possible. Headquartered in San Diego with regional offices located throughout the United States, ICW Group has been named for ten consecutive years as a Top 50 performing P&C organization offering the stability of a large, profitable and growing company combined with a focus on all things people. It's our team members who make us an employer of choice and the vibrant company we are today. We strive to make both our internal and external communities better everyday! PURPOSE OF THE JOB The MLOps Engineer II is responsible for designing, developing, and operating scalable machine learning infrastructure and deployment pipelines on AWS. The MLOps Engineer II works closely with data scientists, cloud engineers, and application teams to productionize machine learning models and ensure reliable, secure, and cost-efficient ML operations. This position applies advanced software engineering and cloud development practices to automate machine learning workflows, optimize infrastructure utilization, and maintain production ML systems. This role requires strong coding skills, experience working with ML systems in production, and the ability to independently implement technical solutions that support the organization’s AI and analytics initiatives.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Data Science, or a related technical field or equivalent combination of education and experience.
  • Minimum 3 - 5 years of experience in software engineering, DevOps, cloud engineering, or MLOps roles.
  • Strong programming experience in Python for building automation, services, or data processing pipelines.
  • Hands-on experience with AWS cloud services, including SageMaker, Lambda, Step Functions, S3, IAM, and CI/CD tools.
  • Experience designing and deploying machine learning models into production environments.
  • Experience building and maintaining CI/CD pipelines and automated deployment workflows.
  • Experience working with Infrastructure-as-Code tools such as CloudFormation, Terraform, or AWS CDK.
  • Strong troubleshooting and problem-solving skills in distributed or cloud-based systems.
  • Experience collaborating with cross-functional teams including data science, engineering, and business stakeholders.

Nice To Haves

  • Experience with containerization technologies such as Docker and container orchestration platforms (ECS or EKS).
  • Experience with ML observability tools, feature stores, or data/model versioning platforms.
  • Familiarity with AI/ML cost optimization strategies and FinOps practices.
  • AWS certifications such as Solutions Architect, DevOps Engineer, or Machine Learning Specialty.
  • Experience operating ML systems in regulated industries or environments handling sensitive data.
  • Experience designing scalable ML platforms or shared ML infrastructure.

Responsibilities

  • Design, develop, and maintain scalable machine learning pipelines using AWS services such as SageMaker, Lambda, Step Functions, and S3.
  • Build and manage deployment frameworks for machine learning models in real-time and batch inference environments.
  • Develop and maintain Python-based tools and services for data processing, model packaging, and ML pipeline orchestration.
  • Design and implement CI/CD pipelines for machine learning systems using GitHub and AWS development tools.
  • Develop and manage infrastructure components using Infrastructure-as-Code tools such as AWS CloudFormation, Terraform, or AWS CDK.
  • Implement monitoring, logging, and alerting solutions to ensure reliability and observability of ML systems in production.
  • Troubleshoot and resolve complex issues in ML development and production environments.
  • Partner with data scientists and engineering teams to integrate machine learning models into enterprise applications and data platforms.
  • Lead implementation of AI/ML FinOps best practices, analyzing resource usage and optimizing compute, storage, and infrastructure costs for ML workloads.
  • Monitor AWS usage, budgets, and cost trends related to ML infrastructure and implement optimization strategies to improve cost efficiency.
  • Improve automation, reliability, and scalability of ML pipelines and operational workflows.
  • Ensure ML systems comply with enterprise security, governance, and regulatory standards in coordination with Information Security teams.
  • Participate in architectural discussions and contribute to technical standards for MLOps and ML infrastructure.
  • Provide technical guidance and mentorship to junior engineers and contribute to knowledge sharing within the team.
  • Conduct code reviews and promote best practices in software engineering, testing, and deployment.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service