LLMOps Engineer

Steampunk•McLean, VA

36d•$115,000 - $145,000

About The Position

We are looking for a skilled LLMOps Engineer to help build, operate, and optimize the infrastructure and pipelines that power our predictive and generative AI capabilities. The LLMOps Engineer will work closely with AI Product Engineers, Data Scientists, and senior LLMOps staff to support the deployment, monitoring, and continuous improvement of LLM-based systems. This role is highly technical and hands-on, with opportunities to grow into senior roles through increasing independence, system ownership, and architectural contributions. You will contribute to the growth of our AI & Data Exploitation Practice!

Requirements

Ability to hold a position of public trust with the U.S. government.
Bachelor’s or Master’s degree in Computer Science, Data Engineering, Machine Learning, or a related field and 5+ years of experience; OR
Master’s degree in Computer Science, Data Engineering, Machine Learning, or a related field and 3+ years of experience.
2+ years of experience in software engineering, DevOps, MLOps, cloud engineering, or data engineering, with exposure to LLM or ML model operations.
Proficiency in Python and familiarity with LLM-related tools and frameworks such as Hugging Face Transformers, LangChain, LlamaIndex, or similar.
Experience with containerization (Docker) and basic orchestration using Kubernetes or serverless model hosting environments.
Hands-on knowledge of cloud platforms (AWS, Azure, or GCP), including compute, storage, and networking fundamentals for AI workloads.
Familiarity with CI/CD pipelines, automated testing, and environment provisioning for AI or data systems.
Understanding of modern DevSecOps principles, security basics, and safe handling of data used in AI pipelines.
Strong debugging and problem-solving skills with an ability to collaborate in cross-functional engineering teams.
Strong written and verbal communication skills with the ability to document workflows and explain operational concepts.

Nice To Haves

Exposure to vector databases, embedding models, and RAG pattern implementations preferred.
Experience working in agile or iterative development environments is a plus.

Responsibilities

Implement and maintain core LLM pipelines, including model hosting endpoints, embedding pipelines, retrieval layers, and vector database integrations.
Build automation for model versioning, dataset management, and configuration tracking to support reproducible and reliable GenAI development.
Develop monitoring and observability components for LLM-based applications, including metrics dashboards, alerting, and logging of model outputs and performance.
Collaborate with AI Product Engineers to move prototypes into production environments, supporting scalability, testing, and integration with mission systems.
Assist in configuring and optimizing inference runtimes, containers, and microservices to ensure responsive and cost-efficient model operations.
Contribute to CI/CD pipelines that support automated testing, evaluation workflows, and safe deployment of updated prompts, models, and context strategies.
Help implement and maintain basic MLSecOps guardrails, such as input validation, prompt protections, and output filtering.
Participate in troubleshooting efforts, root-cause analysis, and issue resolution for operational incidents involving LLM pipelines or infrastructure.
Stay current with emerging tools, libraries, and practices for operationalizing LLMs, including orchestration frameworks and inference accelerators.
Document system workflows, runbooks, and operational best practices to support team knowledge growth and onboarding.