About The Position

As a DevOps Engineer supporting AI Platform & Cloud Infrastructure, you will design, deploy, and maintain the infrastructure that powers production AI systems in secure mission environments. You will help operationalize machine learning and AI-enabled services, ensuring they are scalable, reliable, and secure. This role blends cloud engineering, container orchestration, infrastructure automation, and AI system support. You will work closely with AI engineers and software developers to build robust environments for model training, inference, experimentation, and deployment. Your work will directly impact the reliability and performance of AI-driven capabilities used in operational contexts.

Requirements

  • Active TS/SCI with Polygraph.
  • Experience working in cloud environments (e.g., AWS, Azure, or similar)
  • Proficiency with containerization and orchestration technologies (Docker, Kubernetes)
  • Experience implementing infrastructure-as-code solutions (Terraform, CloudFormation, etc.)
  • Experience building and maintaining CI/CD pipelines
  • Familiarity with Linux system administration
  • Understanding of networking fundamentals and cloud architecture principles
  • Ability to diagnose and resolve distributed system issues
  • Experience using Git-based workflows

Nice To Haves

  • Experience supporting AI/ML model deployment or inference systems
  • Familiarity with model serving frameworks or API-based AI integrations
  • Experience implementing observability stacks (Grafana, Prometheus, Elastic APM, etc.)
  • Familiarity with GPU-enabled workloads and compute scaling
  • Experience working with streaming or dataflow systems
  • Knowledge of AI system reliability and handling model failure modes
  • Experience supporting production CNO capabilities and operations
  • Knowledge of end-to-end SIGINT collection and analysis systems
  • Familiarity with Atlassian tools (Jira, Confluence)

Responsibilities

  • Maintain an active TS/SCI with Polygraph. Candidates without a current clearance will not be considered.
  • Design and maintain cloud infrastructure supporting AI applications and services
  • Deploy and manage containerized AI workloads in Kubernetes or similar platforms
  • Implement infrastructure-as-code for reproducible and scalable environments
  • Support AI model inference services and associated APIs
  • Build and optimize CI/CD pipelines for AI and data workflows
  • Implement monitoring, logging, and observability solutions for AI services
  • Troubleshoot infrastructure, networking, and performance issues across AI systems
  • Collaborate with AI engineers to ensure environments support experimentation and production use cases
  • Ensure infrastructure aligns with security and compliance requirements
  • Document architectural decisions and operational procedures

Benefits

  • Top salaries because we're top performers
  • Pick your PTO – Everyone values time and money differently, so we give you the flexibility to choose between 3 and 5 weeks of PTO with a corresponding adjustment to your pay. Your choice, your balance.
  • All 11 federal holidays, paid!
  • Up to 2 snow days, paid!
  • We’ll quadruple (4x!) the first 6% you contribute to your 401(k), giving you up to a 24% company match. Contributing less than 6%? Unclaimed matches come right back to you as extra income, giving you a guaranteed 24% that goes to your retirement, to your paycheck, or both. C’mon now! 🚀
  • 100% employer-paid medical, dental, vision, life, and disability insurances. That’s a lot. Already covered on health insurance? No problem – we’ll trade you this benefit for a boost to your salary instead.
  • $5,250 annual education assistance for training, certifications, tuition, and even student loan repayments.
  • Spot bonuses for obtained certifications, customer recognition, and just about anything else that makes us go "Hot damn!". We hope to say that many times about you. 🔥
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service