Lead AI/ML Engineer (P4368)

8451Chicago, OH
12d

About The Position

The Lead AI/ML Engineer requires a unique mix of software engineering and AI skills necessary to create, deploy and maintain computationally efficient proprietary SLM, LLM, and embedding model implementations, serving infrastructure, and end-to-end solutions. This role has a specific focus on the models serving and operations within our foundation models team. A strong understanding of distributed systems, model serving architectures, GPU cluster management, and MLOps best practices that will scale across enterprise workloads and large-scale model deployments is critical to success.

Requirements

  • Bachelor's degree or higher in Machine Learning, Computer Science, Computer Engineering, Applied Statistics, or related field
  • 5+ years of experience developing cloud-based software solutions with understanding of design for scalability, performance, and reliability in distributed systems
  • 2+ years hands-on experience with foundation models (LLMs, SLMs, embedding models) in production environments; 2+ years of experience in model serving and inference optimization preferred
  • Deep knowledge of foundation model serving frameworks, particularly Triton Inference Server and vLLM
  • Working experience with PyTorch models and optimization for inference (quantization, pruning, ONNX, TensorRT)
  • Knowledge of distributed GPU computing, CUDA programming, and GPU memory optimization techniques
  • Hands-on experience with GCP and Azure cloud platforms, including GPU instances, managed services, and networking
  • Experience with Databricks for large-scale data processing and model training workflows
  • Knowledge of vector databases and embedding model serving
  • Strong experience with open-source LLM fine-tuning frameworks (LoRA, QLoRA, full fine-tuning)
  • Experience building large-scale model serving solutions that have been successfully delivered to production with enterprise SLAs
  • Excellent communication skills, particularly on technical topics related to distributed systems and model serving architectures
  • Kubernetes & Docker experience with focus on GPU workloads and model serving deployments
  • CI/CD Pipeline experience with focus on ML model deployment; GitHub Actions experience preferred
  • Terraform experience for infrastructure as code, particularly for GPU clusters and cloud ML infrastructure
  • Strong skills in Python, with experience in async programming and high-performance computing
  • API development experience with focus on high-throughput, low-latency model serving endpoints
  • Experience with monitoring and observability tools for distributed systems (Prometheus, Grafana, DataDog, etc.)
  • Knowledge of E2E Machine Learning pipeline and MLOps tools (model registry, experiment tracking, feature stores, model monitoring) in the context of foundation models

Nice To Haves

  • Experience with distributed training frameworks such as DeepSpeed, FSDP, FairScale
  • Knowledge of model compression techniques and hardware acceleration
  • Experience with multi-cloud deployments and hybrid cloud architectures
  • Familiarity with emerging foundation model architectures and serving optimizations

Responsibilities

  • Lead large-scale foundation model projects that can span months, focusing on model serving, inference optimization, and production deployment
  • Foster a collaborative and innovative team environment, encouraging professional growth and development among junior team members in foundation model technologies
  • Leverage known patterns, frameworks, and tools for automating & deploying foundation model serving solutions using Triton, vLLM, and other inference engines
  • Develop new tools, processes and operational capabilities to monitor and analyze foundation model performance, latency, throughput, and resource utilization
  • Work with researchers and ML engineers to optimize and scale foundation model serving using best practices in distributed systems, GPU orchestration, and MLOps
  • Abstract foundation model serving solutions as robust APIs, microservices, or components that can be reused across the business with high availability and low latency
  • Build, steward, and maintain production-grade foundation model serving infrastructure (robust, reliable, maintainable, observable, scalable, performant) to manage and serve LLMs, SLMs, and embedding models at scale
  • Research state-of-the-art foundation model serving technologies, inference optimization techniques, and distributed GPU architectures to identify new opportunities for implementation across the enterprise
  • Design and implement distributed GPU clusters for model training and inference workloads across GCP and Azure cloud environments
  • Understand business requirements and trade-off latency, cost, throughput, and model accuracy to maximize value and translate research into production-ready serving solutions
  • Reduce time to deployment, automate foundation model CI/CD pipelines, implement continuous monitoring of model serving metrics, and establish feedback loops for model performance
  • Responsible for code reviews, infrastructure reviews, and production readiness assessments for foundation model deployments
  • Apply appropriate documentation, version control, infrastructure as code practices, and other internal communication practices across channels
  • Make time-sensitive decisions and solve urgent production issues in foundation model serving environments without escalation

Benefits

  • Medical: with competitive plan designs and support for self-care, wellness and mental health.
  • Dental: with in-network and out-of-network benefit.
  • Vision: with in-network and out-of-network benefit.
  • 401(k) with Roth option and matching contribution.
  • Health Savings Account with matching contribution (requires participation in qualifying medical plan).
  • AD&D and supplemental insurance options to help ensure additional protection for you.
  • Paid time off with flexibility to meet your life needs, including 5 weeks of vacation time, 7 health and wellness days, 3 floating holidays, as well as 6 company-paid holidays per year.
  • Paid leave for maternity, paternity and family care instances.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service