AI/ML Systems Engineer, (2026 New College Graduate)

GlobalFoundries•Richardson, TX

2d•$72,000 - $124,800•Onsite

About The Position

GlobalFoundries (GF) is a leading full-service semiconductor foundry providing a unique combination of design, development, and fabrication services to some of the world’s most inspired technology companies. With a global manufacturing footprint spanning three continents, GF makes possible the technologies and systems that transform industries and give customers the power to shape their markets. We offer many full-time employment paths for recent graduates, which provide accelerated training in a fast-paced work environment, cross-functional working opportunities, and talent mobility. New college graduates are provided with mentorship, networking, and leadership opportunities, which give our new team members life-long connections and skills. We are seeking an early-career AI/ML Systems Engineer to deepen our workload analysis and performance modeling capabilities. You will take ownership of workload characterization and hardware mapping studies, contribute to cross-functional architecture discussions, and help define the team's methodology for estimating and validating performance KPIs. This is a high-impact role for someone who wants to sit at the intersection of machine learning, computer architecture, and systems optimization.

Requirements

Graduating with Bachelor’s or Master’s in Electrical, Computer Engineering, Computer Science or related field from an accredited degree program.
0-2 years of relevant industry experience in systems engineering, hardware architecture, ML infrastructure, or performance engineering.
Must have at least an overall 3.0 GPA and proven good academic standing.
English (Written & Verbal) Fluency
Strong mathematical reasoning is a firm requirement. You should be able to construct and manipulate analytical performance models from first principles, deriving bandwidth utilization bounds, reasoning about arithmetic intensity across operator types, estimating latency under queuing or pipeline constraints, and interpreting numerical precision effects on model accuracy and hardware efficiency.
The ability to move fluidly between mathematical formulation and engineering intuition is central to doing this job well.
Comfortable writing analysis code in Python and can build clean, reproducible models.
Communicate technical results well in both written and spoken form, and you can hold your own in architecture discussions with specialists on either the hardware or software side.

Nice To Haves

Exposure to AI compiler toolchains is preferred.
Familiarity with MLIR, IREE, TVM, or similar compilation infrastructure — even at a conceptual level — will help you engage productively with compiler and runtime engineers and understand how graph-level and kernel-level transformations affect the workloads you analyze.
Prior related internship or co-op experience.
Demonstrated prior leadership experience in the workplace, school projects, competitions, etc.
Project management skills, i.e. the ability to innovate and execute solutions that matter; the ability to navigate ambiguity.
Strong planning & organizational skills.
Experience defining or refining performance KPI frameworks, prior work on edge or mobile SoC workload characterization, hands-on experimentation with MLIR or IREE compilation pipelines, and knowledge of RISC-V architecture and Vector/Matrix extensions is a strong plus.

Responsibilities

Independently study AI/ML workloads across the inference and training stack — including CNNs, transformers, recurrent architectures, and emerging model classes — and build quantitative models of their behavior on real and projected hardware.
Identify compute, memory bandwidth, and power bottlenecks using techniques like roofline analysis, operational intensity profiling, and bottleneck decomposition.
Work closely with SoC and IP architecture teams to map workload demands to hardware capabilities and feed your findings into discussions around design tradeoffs, ISA extensions, memory subsystem sizing, and on-chip vs. off-chip bandwidth allocation.
Engage with compiler and runtime teams to identify where kernel optimization, scheduling, or memory layout changes can close performance gaps.
Build spreadsheet or code-based models that project achievable throughput, latency, and efficiency for candidate architectures, then validating those models against silicon or simulation data.
Communicate findings through written reports, presentations, and design review participation.
Perform all activities in a safe and responsible manner and support all Environmental, Health, Safety & Security requirements and programs.