AI Operations Platform Consultant

ELEVI Associates•Jersey City, NJ

19d•$75,000•Hybrid

About The Position

AI Operations Platform Consultant ● Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift) ● Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server. ● Managing MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production ● Setup and operation of AI inference service monitoring for performance and availability. ● Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc. ● Operation and support of MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production ● Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc. ● Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc. ● Managing scalable infrastructure for deploying and managing LLMs ● Deploying models in production environments, including containerization, microservices, and API design ● Triton Inference Server, including its architecture, configuration, and deployment. ● Model Optimization techniques using Triton with TRTLLM ● Model optimization techniques, including pruning, quantization, and knowledge distillation Why Work at ELEVI? To become an ELEVI employee is to become an integral part of the team. From Federal to Commercial, ELEVI offers our employees the ability to expand skills and fully engage their minds and passions. Each member of the team enjoys the opportunity to expand their career to meet their individual goals and objectives. We celebrate the ideas that bring about positive change, and the diverse talents and backgrounds that come together to create those ideas. Our benefit packages encompass competitive compensation, financial and counseling services, retirement options, and health insurance programs. Coupled with work/life benefits to address significant life problems as well as everyday problems involved in juggling work, family, and life. We trust you to take the time you need when you feel it's appropriate. No need to keep track or save up. We trust our employees and empower them to shape their work themselves so that we can achieve the best possible results. ELEVI IS an equal opportunity employer (EOE) that empowers our people. It is the policy of ELEVI to provide equal employment opportunities to all employees and employment applicants—without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. We fearlessly drive change, because without diversity of thought and a commitment to equality for all, there is no moving forward. Reasonable accommodations are available for qualified individuals with disabilities, upon request. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination.

Requirements

LLM
Kubernetes
Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift)
Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server.
Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc.
Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc.
Triton Inference Server, including its architecture, configuration, and deployment.
Model Optimization techniques using Triton with TRTLLM
Model optimization techniques, including pruning, quantization, and knowledge distillation

Responsibilities

Managing MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
Setup and operation of AI inference service monitoring for performance and availability.
Operation and support of MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
Managing scalable infrastructure for deploying and managing LLMs
Deploying models in production environments, including containerization, microservices, and API design