Sr. DevOps Engineer

PTF ConsultingFort Worth, TX
22h

About The Position

Role Overview We are seeking a Senior DevOps Engineer to design, build, and operate secure, scalable CI/CD and infrastructure platforms supporting production AI and ML workloads. This role enables Machine Learning Engineers, MLOps Engineers, and Data Engineers by ensuring reliable deployment, monitoring, and operation of mission-critical systems. This is a hands-on infrastructure ownership role for an engineer comfortable operating production platforms, troubleshooting complex systems, and working in security-conscious, compliance-driven environments. Responsibilities DevOps & Platform Engineering Design, implement, and maintain automated CI/CD pipelines supporting application, data, and ML deployments. Build, operate, and scale Kubernetes-based platforms for containerized workloads. Manage and optimize Linux-based production systems to ensure performance, reliability, and scalability. Implement infrastructure as code using tools such as Terraform or equivalent frameworks. Support infrastructure for data and ML platforms handling TB- to PB-scale datasets. Reliability, Monitoring & Security Monitor system health, performance, and availability using tools such as Prometheus, Grafana, and centralized logging solutions. Ensure infrastructure and deployment pipelines meet availability, reliability, and performance targets, including 99.9% uptime. Partner with security and compliance teams to align platforms with standards such as NIST 800-53 and FedRAMP. Troubleshoot and resolve infrastructure, deployment, and performance issues in production environments. Collaboration & Enablement Work closely with ML, MLOps, and Data Engineering teams to enable reliable model training, deployment, and scaling. Provide platform tooling, documentation, and operational guidance to development teams. Contribute to operational runbooks, system documentation, and continuous improvement initiatives.

Requirements

  • U.S. Citizen with an active DoD, Intelligence Community, or DHS clearance, or eligibility to obtain and maintain one.
  • Bachelors degree in Computer Science, Information Technology, or a related field, or equivalent professional experience.
  • 5+ years of DevOps engineering experience supporting production environments.
  • 5+ years of Linux system administration experience, including performance tuning and troubleshooting.
  • Hands-on experience with Kubernetes and Docker in production environments.
  • Experience deploying and managing infrastructure in Azure, AWS, or GCP, with Azure experience strongly preferred.
  • Proficiency with scripting languages such as Bash and/or Python.

Nice To Haves

  • Experience supporting AI or ML workloads in production environments.
  • Experience operating in federal, defense, healthcare, or other regulated environments.
  • Familiarity with monitoring and logging stacks such as Prometheus, Grafana, and ELK.
  • Experience with infrastructure-as-code tools such as Terraform.
  • Hands-on experience supporting hybrid or bare-metal infrastructure environments.
  • Relevant certifications, including: Certified Kubernetes Administrator (CKA) AWS Certified DevOps Engineer – Professional Microsoft Certified: Azure DevOps Engineer Expert

Responsibilities

  • Design, implement, and maintain automated CI/CD pipelines supporting application, data, and ML deployments.
  • Build, operate, and scale Kubernetes-based platforms for containerized workloads.
  • Manage and optimize Linux-based production systems to ensure performance, reliability, and scalability.
  • Implement infrastructure as code using tools such as Terraform or equivalent frameworks.
  • Support infrastructure for data and ML platforms handling TB- to PB-scale datasets.
  • Monitor system health, performance, and availability using tools such as Prometheus, Grafana, and centralized logging solutions.
  • Ensure infrastructure and deployment pipelines meet availability, reliability, and performance targets, including 99.9% uptime.
  • Partner with security and compliance teams to align platforms with standards such as NIST 800-53 and FedRAMP.
  • Troubleshoot and resolve infrastructure, deployment, and performance issues in production environments.
  • Work closely with ML, MLOps, and Data Engineering teams to enable reliable model training, deployment, and scaling.
  • Provide platform tooling, documentation, and operational guidance to development teams.
  • Contribute to operational runbooks, system documentation, and continuous improvement initiatives.

Benefits

  • Competitive salary and comprehensive health benefits.
  • 401(k) with company matching.
  • Clearance sponsorship for eligible candidates.
  • Training and certification support for DevOps, Kubernetes, and cloud platforms.
  • Clear growth path into Lead DevOps Engineer or Platform Engineering leadership roles.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service