Senior Engineer-Cloud AI Infrastructure

Huawei Technologies Canada Co., Ltd.•Markham, ON

About The Position

Huawei Canada has an immediate permanent opening for a Senior Engineer. Established in 2014, the Distributed Scheduling and Data Engine Lab is Huawei Cloud's technical innovation center in Canada. The lab focuses on researching and developing advanced cloud technologies, supporting the productization and iterative optimization of its technical achievements. Current research areas include cloud native databases, intelligent SQL engine, AI/Agent infrastructure and LLM/Agent Evaluation Technology. The lab fosters a robust technical environment, allowing collaboration with industry experts to create a highly competitive cloud platform. Join a cutting-edge team building next-generation infrastructure for AI and agentic workloads, sitting at the intersection of research, systems engineering, and product innovation. Track and analyze the latest trends in LLMs, agentic AI, and multi-step agent workflows to inform infrastructure direction. Investigate and address infrastructure bottlenecks across GPU/NPU utilization, data movement, memory hierarchy, and distributed execution. Design system-level architectures for agent execution frameworks, multi-model orchestration, and large-scale inference systems. Evaluate AI/agent workload requirements on cloud and hybrid infrastructure, balancing trade-offs across cost, performance, and scalability. Deep dive into the full infrastructure stack — from distributed schedulers and inference pipelines to caching and data access patterns. Collaborate closely with engineering and product teams to prototype and deliver production-ready solutions grounded in research. Translate emerging AI trends and workload patterns into scalable, impactful infrastructure designs.

Requirements

Solid foundation in distributed systems, cloud infrastructure, or systems engineering, with hands-on experience building or operating large-scale systems
Proficiency with Kubernetes, cluster scheduling, or equivalent orchestration platforms in production environments
Practical experience with AI/ML systems — whether in training pipelines, inference infrastructure, or both
Strong programming skills in low-level or systems-oriented languages such as Go or C++
Familiarity with LLM serving frameworks (e.g., vLLM, SGLang, Triton, Ray) and an understanding of how they interact with underlying hardware
Experience with GPU or accelerator optimization, and a grasp of how hardware constraints shape system design decisions
A research-oriented mindset — comfortable reading papers, running experiments, and prototyping ideas to validate architectural decisions
Sharp analytical and problem-solving skills, with the ability to model workloads, identify performance bottlenecks, and propose principled solutions

Responsibilities

Track and analyze the latest trends in LLMs, agentic AI, and multi-step agent workflows to inform infrastructure direction
Investigate and address infrastructure bottlenecks across GPU/NPU utilization, data movement, memory hierarchy, and distributed execution
Design system-level architectures for agent execution frameworks, multi-model orchestration, and large-scale inference systems
Evaluate AI/agent workload requirements on cloud and hybrid infrastructure, balancing trade-offs across cost, performance, and scalability
Deep dive into the full infrastructure stack — from distributed schedulers and inference pipelines to caching and data access patterns
Collaborate closely with engineering and product teams to prototype and deliver production-ready solutions grounded in research
Translate emerging AI trends and workload patterns into scalable, impactful infrastructure designs

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume