About The Position

We are seeking a Principal Engineer to define and architect the next generation of distributed AI systems across heterogeneous compute platforms, including CPUs, GPUs, IPUs/FNICs/FNICs, and emerging dataflow accelerators. This role focuses on one of the hardest problems in modern computing: How to dynamically execute and optimize large-scale AI computation graphs across diverse hardware while managing state, locality, and performance at system scale. You will operate at the intersection of systems architecture, high-performance computing, and AI infrastructure-defining the execution model, runtime abstractions, and placement strategies that turn a rack of heterogeneous devices into a coherent, programmable system.

Requirements

  • Bachelor's or BS degree in Computer Science, Software Engineering, or a related specialized field, or equivalent experience per business needs.
  • 12-plus years of experience with a Bachelor's degree
  • Proven expertise in defining and implementing software architectures for AI frameworks, protocols, and algorithms.
  • Deep experience in systems architecture, high-performance computing, or distributed systems
  • Strong background in parallel or data-parallel computation models
  • Experience with heterogeneous compute environments (CPU, GPU, DSP, or accelerators)
  • Proven ability to design end-to-end systems from abstraction through implementation
  • Strong understanding of performance trade-offs across compute, memory, and interconnect

Nice To Haves

  • 8-plus years of experience with a Master's degree, or 6-plus years of experience with a PhD.
  • Experience with AI/ML systems, inference infrastructure, or large-scale model serving
  • Familiarity with stream processing, dataflow models, or graph execution systems
  • Knowledge of modern AI frameworks or runtimes
  • Experience building developer-facing SDKs or programming models
  • Background in performance optimization and benchmarking

Responsibilities

  • Define a runtime model for executing AI workloads as distributed computation graphs across heterogeneous resources
  • Design abstractions for graph representation, dependencies, and execution semantics
  • Enable dynamic scheduling and execution across CPUs, GPUs/specialized accelerators, and IPUs/FNICs., and specialized accelerators
  • Architect systems where state (e.g., KV cache) is a first-class concern in scheduling and execution
  • Distributed Inferencing solution: Define models for data locality, memory hierarchy, and state ownership
  • Optimize for minimal data movement and efficient access to distributed state
  • Develop mechanisms to analyze AI computation graphs and classify stages by: compute intensity, memory bandwidth requirements, communication cost, latency sensitivity
  • Drive automated or semi-automated partitioning of workloads across heterogeneous compute
  • Architect frameworks that treat specialized accelerators (e.g., dataflow engines) as first-class execution targets
  • Define execution boundaries, data exchange models, and integration strategies across device classes
  • Enable interoperability across diverse compute paradigms without sacrificing performance
  • Design runtime strategies for Mixture-of-Experts (MoE) models, including: expert placement, routing locality, load balancing vs data movement trade-offs
  • Enhance existing frameworks for MOE and optimize communication path with IPUs/FNICs and compute path with Intel Accelerators.
  • Enable adaptive execution based on real-time system signals (latency, utilization, skew)
  • Define observability and telemetry models for distributed AI execution
  • Build feedback loops that continuously optimize placement, scheduling, and resource utilization
  • Drive system-level performance across latency, throughput, and efficiency metrics
  • Operate as a technical leader and architect, not just an implementer
  • Drive cross-team alignment across hardware, software, and infrastructure
  • Influence long-term system design and platform direction
  • Mentor engineers and shape architectural thinking across the organization

Benefits

  • competitive pay
  • stock bonuses
  • health benefit programs
  • retirement benefit programs
  • vacation benefit programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service