About The Position

At Rackner, we build systems where advanced technologies move beyond prototypes and into real-world operational use. We are seeking an MLOps Engineer to support the deployment and lifecycle management of AI/ML systems within a secure, mission-focused environment. This is not a research role. This is where models become reliable, deployable, and auditable systems. You will operate at the intersection of machine learning, cloud-native infrastructure, and distributed systems, and ensure AI/ML systems are production-ready in environments where reliability and performance matter.

Requirements

  • Experience deploying ML systems into production environments
  • Strong programming skills in Python
  • Hands-on experience with ML pipeline tools (Kubeflow, Airflow, Argo)
  • Hands-on experience with experiment tracking tools (MLflow, ClearML)
  • Experience with Kubernetes and containerized systems (Docker)
  • Familiarity with CI/CD pipelines
  • Understanding of distributed systems and scalable architectures
  • Experience working with LLMs or transformer-based models
  • Experience working with computer vision systems (YOLO, Faster R-CNN)
  • Focus on deployment and integration, not pure research
  • Systems thinker who prioritizes reliability over novelty
  • Comfortable operating in complex, evolving environments
  • Focused on delivering real-world outcomes
  • Active TS/SCI clearance strongly preferred
  • Candidates with an active Secret clearance may be considered and supported for upgrade
  • Candidates without an active clearance must be U.S. citizens, eligible to obtain and maintain a clearance, and able to work in a CAC-enabled or secure environment

Responsibilities

  • Own the ML Lifecycle (End-to-End)
  • Build and operate production-grade ML pipelines
  • Orchestrate workflows using Kubeflow, Airflow, or Argo
  • Implement model versioning, lineage, and reproducibility standards
  • Operationalize AI/ML Systems
  • Deploy models into secure and constrained environments
  • Transition workflows from experimentation → containerized pipelines → production systems
  • Enable both batch and real-time inference architectures
  • Engineer for Reliability
  • Design systems for reproducibility, auditability, and stability
  • Monitor model performance and system health using Prometheus, Grafana, OpenTelemetry
  • Detect and resolve issues such as model drift and system degradation
  • Build Cloud-Native ML Infrastructure
  • Deploy and manage Kubernetes-based ML workloads
  • Containerize pipelines using Docker
  • Support scalable training and inference workflows
  • Establish Data Discipline
  • Support feature engineering and dataset preparation
  • Implement data versioning and governance practices (e.g., lakeFS)
  • Apply metadata and data management standards
  • Create Repeatable Systems
  • Develop runbooks, playbooks, and documentation
  • Build systems that are operationally sustainable and transferable

Benefits

  • 100% covered certifications & training aligned to your role
  • 401(k) with 100% match up to 6%
  • Highly competitive PTO
  • Comprehensive Medical, Dental, Vision coverage
  • Life Insurance + Short & Long-Term Disability
  • Home office & equipment plan
  • Industry-leading weekly pay schedule
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service