Senior AI Systems Engineer

ARAAlbuquerque, NM
14hRemote

About The Position

Essential Functions: Lead the deployment, integration, and operational support of AI platforms, tools, and services, ensuring compatibility with existing systems and enterprise processes. Design, implement, monitor, and optimize AI infrastructure, working with server, cloud, and platform engineering teams. Operationalize machine learning workflows and support AI-enabled applications from development through production deployment and sustainment. Build and maintain CI/CD and MLOps pipelines for model packaging, testing, deployment, rollback, and lifecycle management. Implement infrastructure automation using scripting, Infrastructure as Code, and configuration management practices. Provide ongoing technical support, troubleshooting, root cause analysis, and documentation for AI platforms and user-facing AI services. Maintain observability across AI systems through logging, metrics, performance monitoring, alerting, and incident response practices. Ensure security, compliance, and governance requirements are met, including participation in audits, vulnerability management, and secure architecture reviews. Assess and implement system enhancements to improve performance, scalability, reliability, and cost efficiency. Collaborate across divisions to support diverse AI initiatives and align technical implementations with mission and business objectives. Evaluate emerging AI tools, frameworks, and infrastructure approaches for operational fit, supportability, and long-term value. Develop and maintain technical documentation, runbooks, architecture diagrams, and operational procedures. Experience and Skills Required: Bachelor’s degree in computer science, Engineering, Information Technology, or a related STEM field with 8-10 years of engineering experience. 2+ years of experience supporting AI/ML platforms, MLOps workflows, model deployment, or AI-enabled infrastructure. Strong coding and automation skills in Python, Bash, or similar scripting languages. Experience with AI/ML frameworks and tooling such as PyTorch, Hugging Face, or similar ecosystems. Proficiency with DevOps and MLOps practices, including CI/CD pipelines, Git-based workflows, containerization, and Kubernetes. Experience deploying AI/ML models or AI services into operational environments, including containerized, cloud, or high-performance computing environments. Familiarity with security frameworks and compliance standards such as NIST and CMMC. Familiarity with AI security functionality in enterprise environments including OAuth Strong communication skills and the ability to collaborate effectively across technical and non-technical teams.

Requirements

  • Bachelor’s degree in computer science, Engineering, Information Technology, or a related STEM field with 8-10 years of engineering experience.
  • 2+ years of experience supporting AI/ML platforms, MLOps workflows, model deployment, or AI-enabled infrastructure.
  • Strong coding and automation skills in Python, Bash, or similar scripting languages.
  • Experience with AI/ML frameworks and tooling such as PyTorch, Hugging Face, or similar ecosystems.
  • Proficiency with DevOps and MLOps practices, including CI/CD pipelines, Git-based workflows, containerization, and Kubernetes.
  • Experience deploying AI/ML models or AI services into operational environments, including containerized, cloud, or high-performance computing environments.
  • Familiarity with security frameworks and compliance standards such as NIST and CMMC.
  • Familiarity with AI security functionality in enterprise environments including OAuth
  • Strong communication skills and the ability to collaborate effectively across technical and non-technical teams.

Nice To Haves

  • Advanced degree or certifications related to AI or machine learning.
  • Experience integrating AI models into scientific workflows.
  • Familiarity with large language model (LLM) APIs and orchestration frameworks such as OpenAI, Hugging Face, LangGraph, or LangChain.
  • Experience with model serving, inference optimization, or AI platform tools such as MLflow, Kubeflow, vLLM, or similar.
  • Experience with simulations for scientific or engineering projects, particularly physical systems simulations.
  • Experience with GPU-based systems or running AI models in HPC environments.
  • Experience writing and deploying MCP Servers on Kubernetes
  • DoD experience
  • Secret Security Clearance – Active or Inactive

Responsibilities

  • Lead the deployment, integration, and operational support of AI platforms, tools, and services, ensuring compatibility with existing systems and enterprise processes.
  • Design, implement, monitor, and optimize AI infrastructure, working with server, cloud, and platform engineering teams.
  • Operationalize machine learning workflows and support AI-enabled applications from development through production deployment and sustainment.
  • Build and maintain CI/CD and MLOps pipelines for model packaging, testing, deployment, rollback, and lifecycle management.
  • Implement infrastructure automation using scripting, Infrastructure as Code, and configuration management practices.
  • Provide ongoing technical support, troubleshooting, root cause analysis, and documentation for AI platforms and user-facing AI services.
  • Maintain observability across AI systems through logging, metrics, performance monitoring, alerting, and incident response practices.
  • Ensure security, compliance, and governance requirements are met, including participation in audits, vulnerability management, and secure architecture reviews.
  • Assess and implement system enhancements to improve performance, scalability, reliability, and cost efficiency.
  • Collaborate across divisions to support diverse AI initiatives and align technical implementations with mission and business objectives.
  • Evaluate emerging AI tools, frameworks, and infrastructure approaches for operational fit, supportability, and long-term value.
  • Develop and maintain technical documentation, runbooks, architecture diagrams, and operational procedures.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service