Research Engineer, AI Models

EnCharge AI
Hybrid

About The Position

Modern AI workloads—from large language models to diffusion-based generators to multimodal systems—represent some of the most compute-intensive frontiers in AI, and some of the most promising applications for our hardware’s energy efficiency advantages. We’re building a vertically integrated AI stack that will showcase the transformative potential of our silicon while delivering real value to customers today. We are seeking a Research Engineer to push the boundaries of AI model quality and efficiency. You’ll build fine-tuning pipelines, develop rigorous benchmarking frameworks, and work at the intersection of ML research and hardware-aware optimization—ensuring our models run beautifully on our silicon. This is a role for someone who thrives at the boundary between research and engineering. You’ll read papers, implement techniques, and ship production-quality code—all in service of making AI inference faster, cheaper, and better.

Requirements

  • 5+ years of experience in ML research, applied ML, or ML systems
  • Strong fundamentals in Python and PyTorch
  • Hands-on experience with modern AI models (transformers, diffusion models, or other generative architectures)
  • Experience fine-tuning large models and building training/evaluation pipelines
  • Deep understanding of transformers, attention mechanisms, & optimization techniques
  • Comfort reading and implementing techniques from research papers

Nice To Haves

  • Experience with efficient inference techniques (KV cache optimization, attention variants, MoE routing, flow matching)
  • Background in hardware-aware ML optimization or quantization
  • Familiarity with profiling tools (PyTorch Profiler, Nsight, custom instrumentation)
  • Publications in generative modeling, efficient inference, or ML systems
  • Contributions to open-source ML projects

Responsibilities

  • Research and implement state-of-the-art techniques to accelerate AI inference—quantization, sparsity, distillation, speculative decoding, caching strategies, and architectural modifications.
  • Systematically characterize tradeoffs between model quality, latency, throughput, and power consumption to find optimal operating points across different use cases.
  • Partner closely with hardware, compiler, and quantization teams to ensure algorithmic improvements translate to real gains on our silicon.
  • Identify optimizations aligned with our architecture's strengths—maximizing throughput while minimizing power.
  • Shape the feedback loop between model development and hardware roadmap.
  • Build profiling tools and comprehensive benchmarking frameworks to understand compute bottlenecks, measure model quality across standard and domain-specific evals, and track efficiency metrics.
  • Establish the methodology that informs both algorithmic choices and hardware-software co-design.
  • Build robust fine-tuning workflows for modern AI models, enabling rapid experimentation with LoRA, adapters, and full fine-tuning.
  • Stay current with the rapidly evolving landscape—evaluate new architectures, implement promising techniques, and contribute insights that inform technical and go-to-market strategy.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service