AI Platform Architect

GraphcoreAustin, TX

About The Position

Graphcore is a leading innovator in Artificial Intelligence compute, developing hardware, software, and systems infrastructure to drive AI breakthroughs and adoption across industries. As part of the SoftBank Group, Graphcore aims to enable Artificial Super Intelligence and make its benefits accessible to everyone. The company fosters a culture of continuous learning and innovation, with diverse teams of AI research specialists, silicon designers, software engineers, and systems architects. This role is for a visionary AI Platform Architect responsible for designing and overseeing the comprehensive infrastructure stack for demanding distributed AI workloads. The architect will act as the unifying technical authority across hardware, software, compute, network, and storage, architecting a cohesive, AI rack-scale platform optimized for trillion-parameter LLM training and high-throughput inference. Responsibilities include orchestrating clustering, distributed training frameworks, and the physical layer (PCIe Gen 5/6, NVMe, RDMA) to provide a powerful platform for AI research and deployment teams.

Requirements

  • Demonstrated ability in systems engineering, cloud architecture, or HPC, hardware engineering with at least 4+ years functioning as a Lead or Principal Architect for large-scale AI or machine learning platforms.
  • Deep practical knowledge of how large models are trained and deployed, including data/tensor/pipeline parallelism and the infrastructure requirements of modern LLM architectures.
  • Authoritative understanding of system-level bottlenecks and data pathways, including deep familiarity with PCIe Gen 5/6, NVMe namespaces, and RDMA (RoCEv2/InfiniBand) integration.
  • Experience with container orchestration platforms and infrastructure-as-code (IaC) tailored for GPU-heavy bare-metal and cloud environments.
  • Exceptional ability to bridge the gap between AI researchers/data scientists and low-level hardware/CPU/memory/storage/GPU/network engineers, translating model requirements into strict infrastructure specifications.
  • Ability to generate Platform engineering requirement specifications that can be used to guide and influence future silicon designs.

Nice To Haves

  • Hands on experience with rack-as-a-system AI platforms that integrate all the latest networking, cooling, and GPU technologies currently present in the market.
  • Working knowledge of scripting language such as Python/JSON to characterize workloads on bare metal AI compute systems to expose issues with current Neural engine silicon solutions.

Responsibilities

  • Define the holistic architecture for highly clustered AI environments, ensuring zero-bottleneck data flow between parallel storage systems, AI compute nodes, and ultra-high-bandwidth network fabrics.
  • Influence the strategy for AI workload scheduling and orchestration, utilizing tools like Kubernetes or Slurm to manage distributed training jobs, model check-pointing, and inference serving at massive scale.
  • Profile and eliminate system-level bottlenecks across the entire AI pipeline, tuning everything from deep learning frameworks (PyTorch, DeepSpeed, etc.) down to OS-level NUMA pinning and I/O scheduling.
  • Work closely with software, firmware, and OS engineering to influence platform design, ensuring the software stack fully exploits underlying hardware capabilities, including complex ARM mesh interconnects (RNI, HNF, SNF) and advanced merchant silicon features.
  • Drive the 3-to-5-year technical vision for the AI platform.
  • Collaborate closely with subject matter experts in processor, memory, storage, GPU, thermal, mechanical, BIOS, and Manageability disciplines to define requirements specifications.
  • Communicate and present to internal and external silicon teams to influence features, optimized board routing guidelines, power and thermal targets, and the correct feeds and speeds for a competitive AI platform.
  • Conduct significant market competitive analysis including TCO (OPEX / CAPEX) analysis of new technologies.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service