LLM Ops Engineer (Raleigh)

HirexHire

5d•Hybrid

About The Position

We are seeking an experienced LLM Engineer to join our client's newly established LLM Ops Team in their Raleigh, NC office. In this role, you will be responsible for managing the complex lifecycle of Large Language Models from development to deployment, monitoring, and continuous improvement. This role is hybrid to the Raleigh, NC area.

Requirements

Experience with LLM development, fine-tuning, and deployment
Strong programming skills, particularly in Python
Experience with Kubeflow, Apache Airflow, MLFlow, or other LLM Pipeline technology
Experience with Azure OpenAI, AWS Sagemaker, and/or Vertex AI
Understanding of machine learning operations and MLOps principles
Knowledge of infrastructure scaling and optimization
Experience with AI monitoring tools and dashboard creation
Familiarity with AI safety, bias detection, and compliance requirements
Strong problem-solving abilities and analytical thinking
Familiarity with ISO 27001 and SOC2 Certification

Responsibilities

Fine-tune pre-trained models for specific use cases
Curate and prepare datasets for training
Manage training infrastructure, resources, and computational environments
Implement optimization techniques to improve model performance
Develop and manage APIs for model serving
Scale infrastructure to handle varying demand loads
Build and maintain the GenAI middleware/sidecar layer
Integrate LLMs with existing systems and data sources
Track performance metrics including latency and throughput
Monitor quality metrics such as hallucination rates and accuracy
Optimize costs associated with model inference and training
Create and maintain dashboards for real-time performance insights
Create and maintain golden datasets for benchmark testing
Implement statistical validation methods for model outputs
Set up similarity matching criteria for response evaluation
Develop confidence score thresholds for production systems
Design and implement user feedback collection systems
Establish continuous improvement processes
Create A/B testing frameworks for model and feature evaluation
Conduct trace analysis to identify areas for performance optimization
Implement content moderation systems
Detect and mitigate bias in model outputs
Ensure regulatory compliance in AI systems
Develop output validation frameworks
Version and store prompts systematically
Create and maintain prompt templates
Set up playground environments for prompt testing
Abstract prompts from application code for better maintainability

Benefits

Supportive Company Culture
Global, Dynamic, and Diverse Team
Comprehensive Benefits Package (health insurance, retirement savings, generous PTO, and work-life balance)
Career Growth and Development

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume