SCI Preferred)

Rackner•Dayton, OH

54d•Remote

About The Position

At Rackner, we are building the operational backbone that turns AI/ML capability into real-world mission outcomes. We are seeking an MLOps Engineer to own the lifecycle of AI/ML systems—from experimentation to deployment—within a mission-critical, classified environment supporting Air Force and NASIC-aligned programs. This is not a research role; This is where models become reliable, deployable, auditable systems. You will operate at the intersection of: Machine learning Distributed systems Cloud-native infrastructure …and ensure that AI/ML systems work in the environments where failure is not an option.

Requirements

Experience deploying ML systems into production environments
Strong background in Python and ML frameworks (PyTorch, TensorFlow, etc.)
Hands-on experience with: ML pipeline orchestration tools (Kubeflow, Airflow, Argo) Experiment tracking (MLflow, ClearML)
Experience with Kubernetes and containerized workloads
Familiarity with CI/CD for ML systems
Understanding of distributed systems and scalable architectures
Experience working with: LLMs or transformer-based models computer vision systems (YOLO, Faster R-CNN)
Systems thinker who values reliability over novelty
Comfortable operating in ambiguous, high-stakes environments
Able to translate experimental work into operational capability

Responsibilities

Build and operate production-grade ML pipelines
Orchestrate workflows using Kubeflow, Airflow, or Argo
Implement model versioning, lineage, and reproducibility standards
Deploy models into mission environments (including constrained or classified systems)
Transition workflows from Jupyter experimentation → containerized pipelines → production systems
Enable both batch and real-time inference architectures
Design systems for reproducibility, auditability, and stability
Implement monitoring for: model performance & drift system health & latency
Use tools like Prometheus, Grafana, and OpenTelemetry
Deploy and manage Kubernetes-based ML workloads
Containerize pipelines using Docker / OCI standards
Scale compute for training and inference workloads
Enable data versioning and governance (lakeFS or similar)
Support feature engineering and dataset preparation pipelines
Apply metadata standards (e.g., STAC) where applicable
Develop runbooks, playbooks, and deployment standards
Build systems that can be operated by others; not just understood by you