Compiler Architect

d-MatrixSanta Clara, CA
2dHybrid

About The Position

At d-Matrix, we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture is one of respect and collaboration. We value humility and believe in direct communication. Our team is inclusive, and our differing perspectives allow for better solutions. We are seeking individuals passionate about tackling challenges and are driven by execution. Ready to come find your playground? Together, we can help shape the endless possibilities of AI. The Role: Compiler Architect As a hands-on Front-End Software Compiler Architect focused on cloud-based AI inference, you will drive the design and implementation of a scalable MLIR-based compiler framework optimized for deploying large-scale NLP and transformer models in cloud environments. You will architect the end-to-end software pipeline that translates high-level AI models into efficient, low-latency executables on a distributed, multi-chiplet hardware platform featuring heterogeneous compute elements such as in-memory tensor processors, vector engines, and hierarchical memory. Your compiler designs will enable dynamic partitioning, scheduling, and deployment of inference workloads across a cloud-scale infrastructure, supporting both statically compiled and runtime-optimized execution paths. You will focus on compiler strategies that minimize inference latency, maximize throughput, and efficiently utilize compute and memory resources in data center environments, in addition to your work on developing the compiler. You will collaborate cross-functionally with systems architects, ML framework teams, runtime developers, performance engineers, and cloud orchestration groups to ensure seamless integration and optimized inference delivery at scale.

Requirements

  • BS 15+ Yrs / MS 12+ Yrs / PhD 10+ Yrs Computer Science or Electrical Engineering, with 12+ years of experience in Front End Compiler and systems software development, with a focus on ML inference.
  • Deep experience in designing or leading compiler efforts using MLIR, LLVM, Torch-MLIR, or similar frameworks.
  • Strong understanding of model optimization for inference: quantization, fusion, tensor layout transformation, memory hierarchy utilization, and scheduling.
  • Expertise in deploying ML models to heterogeneous compute environments, with specific attention to latency, throughput, and resource scaling in cloud systems.
  • Proven track record working with AI frameworks (e.g., PyTorch, TensorFlow), ONNX, and hardware backends.
  • Experience with cloud infrastructure, including resource provisioning, distributed execution, and profiling tools.

Nice To Haves

  • Experience targeting inference accelerators (AI ASICs, FPGAs, GPUs) in cloud-scale deployments.
  • Knowledge of cloud deployment orchestration (e.g., Kubernetes, containerized AI workloads).
  • Strong leadership skills with experience mentoring teams and collaborating with large-scale software and hardware organizations.
  • Excellent written and verbal communication; capable of presenting complex compiler architectures and trade-offs to both technical and executive stakeholders.

Responsibilities

  • Architect the MLIR-based compiler for cloud inference workloads, focusing on efficient mapping of large-scale AI models (e.g., LLMs, Transformers, Torch-MLIR) onto distributed compute and memory hierarchies.
  • Lead the development of compiler passes for model partitioning, operator fusion, tensor layout optimization, memory tiling, and latency-aware scheduling.
  • Design support for hybrid offline/online compilation and deployment flows with runtime-aware mapping, allowing for adaptive resource utilization and load balancing in cloud scenarios.
  • Define compiler abstractions that interoperate efficiently with runtime systems, orchestration layers, and cloud deployment frameworks.
  • Drive scalability, reproducibility, and performance through well-designed IR transformations and distributed execution strategies.
  • Mentor and guide a team of compiler engineers to deliver high-performance inference-optimized software stacks.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

Ph.D. or professional degree

Number of Employees

101-250 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service