We are seeking a highly skilled and experienced MLOps Engineer to join our team and contribute to the development and maintenance of our ML platform both on premises and AWS Cloud. As a MLOps Engineer, you will be responsible for architecting, deploying, and optimizing the ML & Data platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana. Your expertise in AWS services such as EKS, EC2, VPC, IAM, S3, and EFS will be crucial in ensuring the smooth operation and scalability of our ML infrastructure. You will work closely with cross-functional teams, including data scientists, software engineers, and infrastructure specialists, to ensure the smooth operation and scalability of our ML infrastructure. Your expertise in MLOps, DevOps, and knowledge of GPU clusters will be vital in enabling efficient training and deployment of ML models.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Industry
Computer and Electronic Product Manufacturing
Education Level
Master's degree
Number of Employees
5,001-10,000 employees