SCI Preferred)

Rackner•Dayton, OH

16d•Hybrid

About The Position

At Rackner, we build systems where advanced technologies move beyond prototypes and into real-world operational use. We are seeking an MLOps Engineer to support the deployment and lifecycle management of AI/ML systems within a secure, mission-focused environment. This is not a research role. This is where models become reliable, deployable, and auditable systems. You will operate at the intersection of machine learning, cloud-native infrastructure, and distributed systems, and ensure AI/ML systems are production-ready in environments where reliability and performance matter.

Requirements

Experience deploying ML systems into production environments
Strong programming skills in Python
Hands-on experience with ML pipeline tools (Kubeflow, Airflow, Argo)
Hands-on experience with experiment tracking tools (MLflow, ClearML)
Experience with Kubernetes and containerized systems (Docker)
Familiarity with CI/CD pipelines
Understanding of distributed systems and scalable architectures
Experience working with LLMs or transformer-based models
Experience working with computer vision systems (YOLO, Faster R-CNN)
Focus on deployment and integration, not pure research
Systems thinker who prioritizes reliability over novelty
Comfortable operating in complex, evolving environments
Focused on delivering real-world outcomes
Active TS/SCI clearance strongly preferred
Candidates with an active Secret clearance may be considered and supported for upgrade
Candidates without an active clearance must be U.S. citizens, eligible to obtain and maintain a clearance, and able to work in a CAC-enabled or secure environment

Responsibilities

Own the ML Lifecycle (End-to-End)
Build and operate production-grade ML pipelines
Orchestrate workflows using Kubeflow, Airflow, or Argo
Implement model versioning, lineage, and reproducibility standards
Operationalize AI/ML Systems
Deploy models into secure and constrained environments
Transition workflows from experimentation → containerized pipelines → production systems
Enable both batch and real-time inference architectures
Engineer for Reliability
Design systems for reproducibility, auditability, and stability
Monitor model performance and system health using Prometheus, Grafana, OpenTelemetry
Detect and resolve issues such as model drift and system degradation
Build Cloud-Native ML Infrastructure
Deploy and manage Kubernetes-based ML workloads
Containerize pipelines using Docker
Support scalable training and inference workflows
Establish Data Discipline
Support feature engineering and dataset preparation
Implement data versioning and governance practices (e.g., lakeFS)
Apply metadata and data management standards
Create Repeatable Systems
Develop runbooks, playbooks, and documentation
Build systems that are operationally sustainable and transferable