Inference ML API SDET

Cerebras Systems•Toronto, ON

3d•Hybrid

About The Position

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. About The Team The Cloud Quality team is responsible for the confidence behind every production release shipped to Cerebras Inference Cloud. We work closely with platform, infrastructure, ML systems, and product engineering teams to ensure that rapid iteration never comes at the expense of customer trust. Our environment spans distributed cloud systems, multi-region deployments, APIs, orchestration layers, and hardware-backed inference services. We are scaling quickly. The systems are growing in complexity, traffic is increasing rapidly, and release velocity remains high. We need engineers who can build quality systems that scale with the business. About The Role As a Senior Software Engineer in Test for the ML API features team, you will lead testing strategy and execution for AI/ML models, evaluating accuracy, fairness, and performance at scale. You will serve as a key technical leader in delivering and validating all software and hardware components for Cerebras API Features. You will own software components feature integration quality and drive pre-deployment and production validation for Cerebras inference solutions. In this role, you will define and champion best testing practices, establish robust debugging methodologies, and mentor junior engineers while advocating for world-class product quality.

Requirements

5+ years of relevant industry experience in software integration, development, or quality engineering.
Deep expertise in automation and programming using one or more languages such as Python, C++, or Go; ability to design and build reusable test frameworks from the ground up.
Proven experience testing compute, machine learning, networking, or storage systems within large-scale enterprise environments.
Strong track record of debugging complex issues across distributed, scaled-out deployments.
Demonstrated ability to lead cross-functional quality initiatives involving product development, product management, customer operations, and field teams.
Excellent verbal and written communication skills, with experience presenting technical findings to both engineering and leadership audiences.
Strong organizational skills, ownership mindset, and ability to drive projects to completion independently.
Experience leading and mentoring engineers across geographically dispersed teams and time zones.

Nice To Haves

Hands-on experience with ML workloads including LLM and/or multimodal training or inference.
Deep familiarity with hardware architecture, performance optimizations, compilers, and ML frameworks.
Experience designing test strategies for distributed systems, cloud infrastructure, and security validation.
Experience with microservices deployment, debugging, and orchestration at scale.
Prior experience owning or significantly contributing to a team's quality engineering culture or test infrastructure.

Responsibilities

Architect and own end-to-end test strategies for new features, developing scalable tests, frameworks, and tooling to ensure quality.
Lead contributions to industry-standard benchmarks and drive adoption of rigorous evaluation methodologies.
Define and drive automation initiatives to significantly improve internal engineering efficiency and test coverage.
Make strategic decisions around coverage trade-offs, resource requirements, and risk-based testing priorities.
Serve as a technical anchor in a highly agile environment, adapting quickly to shifting priorities while maintaining quality standards.
Mentor and guide junior SDETs on testing methodology, debugging practices, and automation development.
Proactively identify systemic quality gaps and drive cross-functional initiatives to address them.
Lead and facilitate effective technical communication across teams and time zones.