AI Performance Optimization Engineer

Bright Vision TechnologiesFrisco, TX
Remote

About The Position

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we’re looking for a skilled AI Performance Optimization Engineer to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
  • Six or more years of experience in performance engineering, ML systems, or HPC.
  • Strong proficiency in Python and C++.
  • Hands-on experience optimizing deep learning workloads on modern GPUs.
  • Deep understanding of distributed training and inference techniques.
  • Experience with profiling tools across CPU, GPU, and distributed systems.
  • Familiarity with model compression techniques and their accuracy implications.
  • Strong grasp of memory hierarchies, communication primitives, and parallelism strategies.
  • Excellent measurement, debugging, and analytical reasoning skills.
  • Strong communication and collaboration skills.

Nice To Haves

  • Experience optimizing LLM inference at production scale.
  • Contributions to vLLM, TensorRT-LLM, DeepSpeed, or similar projects.
  • Familiarity with custom kernel authoring in Triton or CUTLASS.
  • Experience with FinOps for AI workloads.
  • Publications or talks on AI systems performance.

Responsibilities

  • Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost.
  • Identify and eliminate bottlenecks across data loading, model compute, communication, and memory.
  • Implement and tune quantization, sparsity, and pruning strategies to reduce model footprint and accelerate inference.
  • Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding.
  • Tune attention implementations using FlashAttention, paged attention, and related techniques.
  • Implement KV cache optimization, continuous batching, and speculative decoding for LLM serving.
  • Drive compiler-level optimizations using Triton, XLA, TorchInductor, or TVM, working with the broader ML framework community to land improvements that translate into measurable end-to-end performance gains.
  • Optimize data pipelines, sharding strategies, and storage access patterns for high-throughput training.
  • Build and maintain rigorous benchmark suites and regression frameworks across workloads.
  • Collaborate with ML and platform engineering teams to embed best practices in standard pipelines.
  • Drive cost-efficiency improvements through model architecture, hardware selection, and scheduling strategies.
  • Evaluate new hardware and software offerings, and advise on adoption.
  • Document performance tuning playbooks and share findings broadly across engineering teams.
  • Stay current with AI systems research and translate advances into production improvements.

Benefits

  • Competitive base salary commensurate with experience, plus benefits.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service