Machine Learning Operations Engineer

CGI•Strongsville, OH

3h•Onsite

About The Position

We are seeking an experienced MLOps Engineer with strong expertise in Python and big data technologies to join our team. This role focuses on operational excellence, including optimizing feature engineering pipelines and maintaining machine learning models in production environments. Desired candidate will work closely with platform and data science teams to ensure scalable, reliable, and high-performance ML workflows using existing frameworks. This position will be performed onsite five days a week at our client site in Strongsville, OH. Future duties and responsibilities . Optimize and maintain large-scale feature engineering pipelines using PySpark, Pandas, and PyArrow on Hadoop-based infrastructure. . Refactor and modularize ML codebases to enhance reusability, maintainability, and performance. . Collaborate with platform teams on compute capacity planning, resource allocation, and system upgrades. . Integrate with existing model serving frameworks to support testing, deployment, and rollback processes. . Monitor and troubleshoot production ML pipelines, ensuring high reliability, low latency, and cost efficiency. . Contribute to internal ML platforms by sharing insights, proposing improvements, and documenting best practices. . Build near real-time ML pipelines using Kafka and Spark Streaming. . Work with AWS and SageMaker MLOps ecosystem. Required qualifications to be successful in this role . 6+ years of experience in software engineering, data engineering, or MLOps roles. . Strong programming expertise in Python, with hands-on experience in Pandas, PySpark, and PyArrow. . Deep understanding of the Hadoop ecosystem, distributed computing, and performance tuning. . Experience with CI/CD pipelines and best practices in ML environments. . Hands-on experience with monitoring tools for ML pipeline health and performance. . Strong collaboration skills with experience working in cross-functional teams (platform, data science, engineering). . Experience contributing to or building internal MLOps frameworks/platforms. . Familiarity with SLURM clusters or other distributed job schedulers. . Exposure to Kafka, Spark Streaming, or other real-time data processing technologies. . Understanding of ML lifecycle management, including versioning, deployment, and drift detection.

Requirements

6+ years of experience in software engineering, data engineering, or MLOps roles.
Strong programming expertise in Python, with hands-on experience in Pandas, PySpark, and PyArrow.
Deep understanding of the Hadoop ecosystem, distributed computing, and performance tuning.
Experience with CI/CD pipelines and best practices in ML environments.
Hands-on experience with monitoring tools for ML pipeline health and performance.
Strong collaboration skills with experience working in cross-functional teams (platform, data science, engineering).
Experience contributing to or building internal MLOps frameworks/platforms.
Familiarity with SLURM clusters or other distributed job schedulers.
Exposure to Kafka, Spark Streaming, or other real-time data processing technologies.
Understanding of ML lifecycle management, including versioning, deployment, and drift detection.

Responsibilities

Optimize and maintain large-scale feature engineering pipelines using PySpark, Pandas, and PyArrow on Hadoop-based infrastructure.
Refactor and modularize ML codebases to enhance reusability, maintainability, and performance.
Collaborate with platform teams on compute capacity planning, resource allocation, and system upgrades.
Integrate with existing model serving frameworks to support testing, deployment, and rollback processes.
Monitor and troubleshoot production ML pipelines, ensuring high reliability, low latency, and cost efficiency.
Contribute to internal ML platforms by sharing insights, proposing improvements, and documenting best practices.
Build near real-time ML pipelines using Kafka and Spark Streaming.
Work with AWS and SageMaker MLOps ecosystem.