LLM Ops Engineer (Raleigh)

HirexHire
5dHybrid

About The Position

We are seeking an experienced LLM Engineer to join our client's newly established LLM Ops Team in their Raleigh, NC office. In this role, you will be responsible for managing the complex lifecycle of Large Language Models from development to deployment, monitoring, and continuous improvement. This role is hybrid to the Raleigh, NC area.

Requirements

  • Experience with LLM development, fine-tuning, and deployment
  • Strong programming skills, particularly in Python
  • Experience with Kubeflow, Apache Airflow, MLFlow, or other LLM Pipeline technology
  • Experience with Azure OpenAI, AWS Sagemaker, and/or Vertex AI
  • Understanding of machine learning operations and MLOps principles
  • Knowledge of infrastructure scaling and optimization
  • Experience with AI monitoring tools and dashboard creation
  • Familiarity with AI safety, bias detection, and compliance requirements
  • Strong problem-solving abilities and analytical thinking
  • Familiarity with ISO 27001 and SOC2 Certification

Responsibilities

  • Fine-tune pre-trained models for specific use cases
  • Curate and prepare datasets for training
  • Manage training infrastructure, resources, and computational environments
  • Implement optimization techniques to improve model performance
  • Develop and manage APIs for model serving
  • Scale infrastructure to handle varying demand loads
  • Build and maintain the GenAI middleware/sidecar layer
  • Integrate LLMs with existing systems and data sources
  • Track performance metrics including latency and throughput
  • Monitor quality metrics such as hallucination rates and accuracy
  • Optimize costs associated with model inference and training
  • Create and maintain dashboards for real-time performance insights
  • Create and maintain golden datasets for benchmark testing
  • Implement statistical validation methods for model outputs
  • Set up similarity matching criteria for response evaluation
  • Develop confidence score thresholds for production systems
  • Design and implement user feedback collection systems
  • Establish continuous improvement processes
  • Create A/B testing frameworks for model and feature evaluation
  • Conduct trace analysis to identify areas for performance optimization
  • Implement content moderation systems
  • Detect and mitigate bias in model outputs
  • Ensure regulatory compliance in AI systems
  • Develop output validation frameworks
  • Version and store prompts systematically
  • Create and maintain prompt templates
  • Set up playground environments for prompt testing
  • Abstract prompts from application code for better maintainability

Benefits

  • Supportive Company Culture
  • Global, Dynamic, and Diverse Team
  • Comprehensive Benefits Package (health insurance, retirement savings, generous PTO, and work-life balance)
  • Career Growth and Development
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service