Principal Engineer – Distributed AI Systems Architecture (Heterogeneous Compute)

Intel Corporation•Austin, TX

1d•Onsite

About The Position

We are seeking a Principal Engineer to define and architect the next generation of distributed AI systems across heterogeneous compute platforms, including CPUs, GPUs, IPUs/FNICs/FNICs, and emerging dataflow accelerators. This role focuses on one of the hardest problems in modern computing: How to dynamically execute and optimize large-scale AI computation graphs across diverse hardware while managing state, locality, and performance at system scale. You will operate at the intersection of systems architecture, high-performance computing, and AI infrastructure-defining the execution model, runtime abstractions, and placement strategies that turn a rack of heterogeneous devices into a coherent, programmable system.

Requirements

Bachelor's or BS degree in Computer Science, Software Engineering, or a related specialized field, or equivalent experience per business needs.
12-plus years of experience with a Bachelor's degree
Proven expertise in defining and implementing software architectures for AI frameworks, protocols, and algorithms.
Deep experience in systems architecture, high-performance computing, or distributed systems
Strong background in parallel or data-parallel computation models
Experience with heterogeneous compute environments (CPU, GPU, DSP, or accelerators)
Proven ability to design end-to-end systems from abstraction through implementation
Strong understanding of performance trade-offs across compute, memory, and interconnect

Nice To Haves

8-plus years of experience with a Master's degree, or 6-plus years of experience with a PhD.
Experience with AI/ML systems, inference infrastructure, or large-scale model serving
Familiarity with stream processing, dataflow models, or graph execution systems
Knowledge of modern AI frameworks or runtimes
Experience building developer-facing SDKs or programming models
Background in performance optimization and benchmarking

Responsibilities

Define a runtime model for executing AI workloads as distributed computation graphs across heterogeneous resources
Design abstractions for graph representation, dependencies, and execution semantics
Enable dynamic scheduling and execution across CPUs, GPUs/specialized accelerators, and IPUs/FNICs., and specialized accelerators
Architect systems where state (e.g., KV cache) is a first-class concern in scheduling and execution
Distributed Inferencing solution: Define models for data locality, memory hierarchy, and state ownership
Optimize for minimal data movement and efficient access to distributed state
Develop mechanisms to analyze AI computation graphs and classify stages by: compute intensity, memory bandwidth requirements, communication cost, latency sensitivity
Drive automated or semi-automated partitioning of workloads across heterogeneous compute
Architect frameworks that treat specialized accelerators (e.g., dataflow engines) as first-class execution targets
Define execution boundaries, data exchange models, and integration strategies across device classes
Enable interoperability across diverse compute paradigms without sacrificing performance
Design runtime strategies for Mixture-of-Experts (MoE) models, including: expert placement, routing locality, load balancing vs data movement trade-offs
Enhance existing frameworks for MOE and optimize communication path with IPUs/FNICs and compute path with Intel Accelerators.
Enable adaptive execution based on real-time system signals (latency, utilization, skew)
Define observability and telemetry models for distributed AI execution
Build feedback loops that continuously optimize placement, scheduling, and resource utilization
Drive system-level performance across latency, throughput, and efficiency metrics
Operate as a technical leader and architect, not just an implementer
Drive cross-team alignment across hardware, software, and infrastructure
Influence long-term system design and platform direction
Mentor engineers and shape architectural thinking across the organization