Senior AI Systems Engineer

ARA•Albuquerque, NM

60d•Hybrid

About The Position

This role involves leading the deployment, integration, and operational support of AI platforms, tools, and services. The engineer will design, implement, monitor, and optimize AI infrastructure, working closely with server, cloud, and platform engineering teams. A key aspect of the role is operationalizing machine learning workflows and supporting AI-enabled applications throughout their lifecycle, from development to production deployment and sustainment. This includes building and maintaining CI/CD and MLOps pipelines for model packaging, testing, deployment, rollback, and lifecycle management. The position also requires implementing infrastructure automation, providing technical support, troubleshooting, and ensuring system observability through logging, metrics, and monitoring. Security, compliance, and governance are critical, as is assessing and implementing system enhancements for performance, scalability, reliability, and cost efficiency. Collaboration across divisions to support diverse AI initiatives and evaluating emerging AI tools and infrastructure approaches are also key responsibilities. The role requires developing and maintaining comprehensive technical documentation.

Requirements

Bachelor’s degree in computer science, Engineering, Information Technology, or a related STEM field with 8-10 years of engineering experience.
2+ years of experience supporting AI/ML platforms, MLOps workflows, model deployment, or AI-enabled infrastructure.
Strong coding and automation skills in Python, Bash, or similar scripting languages.
Experience with AI/ML frameworks and tooling such as PyTorch, Hugging Face, or similar ecosystems.
Proficiency with DevOps and MLOps practices, including CI/CD pipelines, Git-based workflows, containerization, and Kubernetes.
Experience deploying AI/ML models or AI services into operational environments, including containerized, cloud, or high-performance computing environments.
Familiarity with security frameworks and compliance standards such as NIST and CMMC.
Familiarity with AI security functionality in enterprise environments including OAuth
Strong communication skills and the ability to collaborate effectively across technical and non-technical teams.

Nice To Haves

Advanced degree or certifications related to AI or machine learning.
Experience integrating AI models into scientific workflows.
Familiarity with large language model (LLM) APIs and orchestration frameworks such as OpenAI, Hugging Face, LangGraph, or LangChain.
Experience with model serving, inference optimization, or AI platform tools such as MLflow, Kubeflow, vLLM, or similar.
Experience with simulations for scientific or engineering projects, particularly physical systems simulations.
Experience with GPU-based systems or running AI models in HPC environments.
Experience writing and deploying MCP Servers on Kubernetes
DoD experience
Secret Security Clearance – Active or Inactive

Responsibilities

Lead the deployment, integration, and operational support of AI platforms, tools, and services, ensuring compatibility with existing systems and enterprise processes.
Design, implement, monitor, and optimize AI infrastructure, working with server, cloud, and platform engineering teams.
Operationalize machine learning workflows and support AI-enabled applications from development through production deployment and sustainment.
Build and maintain CI/CD and MLOps pipelines for model packaging, testing, deployment, rollback, and lifecycle management.
Implement infrastructure automation using scripting, Infrastructure as Code, and configuration management practices.
Provide ongoing technical support, troubleshooting, root cause analysis, and documentation for AI platforms and user-facing AI services.
Maintain observability across AI systems through logging, metrics, performance monitoring, alerting, and incident response practices.
Ensure security, compliance, and governance requirements are met, including participation in audits, vulnerability management, and secure architecture reviews.
Assess and implement system enhancements to improve performance, scalability, reliability, and cost efficiency.
Collaborate across divisions to support diverse AI initiatives and align technical implementations with mission and business objectives.
Evaluate emerging AI tools, frameworks, and infrastructure approaches for operational fit, supportability, and long-term value.
Develop and maintain technical documentation, runbooks, architecture diagrams, and operational procedures.