AI DevOps Systems Administrator

Column Technical ServicesScottsdale, AZ
Onsite

About The Position

Column Technical Services is seeking a highly skilled AI DevOps Systems Administrator to architect, support, and evolve the infrastructure powering our cutting-edge Artificial Intelligence and Machine Learning initiatives in a secure, classified environment in Scottsdale, AZ. In this role, you'll be at the forefront of innovation, driving reliable model development and deployment by optimizing pipelines, maximizing compute performance, and ensuring robust scalability and security across platforms. This is a unique opportunity to work with advanced technologies while making a direct impact on mission-critical systems. If you're passionate about AI infrastructure, thrive in high-performance environments, and are ready to take on meaningful, complex challenges, we encourage you to apply. In this role, you will work closely with data scientists and machine learning engineers to enable seamless transitions from experimentation to production. Sponsorship is not available for this role. Candidates must currently reside in or near Scottsdale, Arizona.

Requirements

  • Minimum of 8 years of relevant experience OR a Master's degree with 6+ years of experience
  • Bachelor's degree in Computer Science, a related discipline, or equivalent experience
  • Deep expertise in server-based operating systems
  • Strong proficiency in Linux environments, containerization, and AI/ML infrastructure
  • Proven ability to serve as a subject matter expert and mentor team members
  • Advanced troubleshooting skills across operating systems, networking, and storage technologies
  • Hands-on experience building, deploying, and maintaining enterprise-scale server environments
  • Candidates must currently reside in or near Scottsdale, Arizona.

Nice To Haves

  • Exposure to or experience working with AI/ML workloads is highly desirable
  • Willingness to travel occasionally

Responsibilities

  • Architect, deploy, and support scalable environments for AI/ML training and inference workloads
  • Build and maintain automated CI/CD workflows for machine learning models and AI-driven applications
  • Administer and fine-tune Linux-based systems across physical and virtual infrastructures
  • Implement and manage containerized environments using tools such as Docker and Kubernetes to support scalable ML services
  • Utilize Infrastructure as Code (IaC) solutions (e.g., Terraform, Ansible) to automate provisioning, configuration, and system management
  • Optimize allocation and usage of GPU resources for compute-intensive workloads
  • Establish monitoring, logging, and alerting frameworks to ensure system health, availability, and performance
  • Partner with engineering teams to troubleshoot issues, improve workflows, and meet infrastructure requirements
  • Serve as a key technical point of contact, supporting users and participating in system design and evolution efforts to align with emerging technologies
  • Install, configure, and maintain software and system components
  • Diagnose and resolve technical issues, including access control and permissions
  • Provide guidance and training to users on system functionality
  • Manage daily operations of server environments across both physical and virtual platforms
  • Configure, maintain, and troubleshoot hardware, operating systems, and network interfaces
  • Investigate and resolve system alerts, ensuring continuity of services
  • Develop scripts to streamline and automate repetitive operational tasks
  • Collaborate directly with stakeholders to identify, isolate, and resolve system-related issues impacting broader services
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service