MLOps Platform Engineer

CGI•Reston, VA

8h•$107,700 - $154,300•Hybrid

About The Position

CGI has an immediate need for a MLOps Platform Engineer to join our team. This is an exciting opportunity to work in a fast-paced team environment supporting one of the largest customers. We take an innovative approach to supporting our client, working side-by-side in an agile environment using emerging technologies. We partner with 15 of the top 20 banks globally, and our top 10 banking clients have worked with us for an average of 26 years!. This role is located at a client site in Reston, VA. A hybrid working model is acceptable. The Data Modeling, Analytics & AI Engineering team is seeking a hands-on MLOps Platform Engineer to design, build, and operate enterprise-grade machine learning platforms. This role focuses on enabling scalable, secure, and reliable ML model development and deployment across AWS cloud environments and Kubernetes (EKS) clusters. You will play a key role in engineering and supporting infrastructure for ML training, batch inference, and real-time model serving. The position requires strong platform engineering fundamentals, CI/CD automation expertise, and experience operating containerized workloads in production environments. This role works closely with Data Scientists, ML Engineers, and application teams to operationalize end-to-end ML solutions while ensuring performance, governance, and cost efficiency in a regulated enterprise environment.

Requirements

7+ years of experience with AWS services including EKS, EC2, S3, IAM, CloudWatch, and ECR
Strong operational knowledge of Kubernetes, preferably AWS EKS
Experience designing and managing containerized workloads (Docker)
Proficiency in Python and Bash scripting
Experience building and maintaining CI/CD pipelines (GitLab or equivalent)
Familiarity with ML workflows including training, inference, and model monitoring
Experience with Infrastructure as Code (Terraform or CloudFormation)
Experience supporting production platforms, including incident response and root cause analysis
Strong understanding of RBAC, network policies, and multi-tenant Kubernetes designs
Knowledge of monitoring, logging, observability, and performance tuning practices
Experience with ML platforms such as Domino or Amazon SageMaker
Familiarity with MLflow or similar ML lifecycle tools
Experience supporting GPU-based workloads or distributed training
Understanding of enterprise MLOps architecture patterns (batch, real-time, microservices)
Exposure to data processing frameworks and feature pipelines