Member of Technical Staff — CI Engineer

RadixArk•Palo Alto, CA

About The Position

RadixArk is hiring a Member of Technical Staff — CI Engineer to own the infrastructure that keeps SGLang moving. Our CI system runs 300+ GPU tests across NVIDIA, AMD, Intel, and Ascend hardware pools, gating every commit to one of the fastest-growing open-source LLM inference engines. When CI is green and fast, 100+ contributors ship with confidence. When it isn't, the entire project stalls. That bottleneck is your problem to solve. You won't just maintain pipelines — you'll architect them. You'll replace brittle static thresholds with regression-based detection, harden runners against supply-chain attacks from fork PRs, and cut cycle times so contributors get feedback in minutes, not hours. You'll work directly with core maintainers, hardware partners, and the open-source community to keep the system that gates every merge request trustworthy, fast, and secure. This is not a role for someone who wants to write CI YAML and walk away. It's for an engineer who treats CI infrastructure the way we treat serving infrastructure — as a system worth designing well.

Requirements

3+ years operating CI/CD at scale (GitHub Actions, Buildkite, Jenkins, GitLab CI, or similar)
Deep Linux, Docker, GPU computing knowledge
Self-hosted runner management experience
Strong Bash and Python
Security mindset — CI supply chain risks, fork PR attack vectors, runner hardening
NVIDIA GPU drivers, CUDA, NCCL, InfiniBand/RDMA experience in CI contexts
Familiarity with ML inference workloads (model loading, KV cache, quantization)

Nice To Haves

Large open-source project CI experience (100+ contributors)
AMD ROCm or Intel XPU CI pipelines

Responsibilities

Own CI reliability end-to-end — triage failures, distinguish real regressions from flaky tests and infra issues, keep main green
Build regression-based CI — replace hardcoded static thresholds with automated baseline comparison (metrics pipeline, durable storage, detection logic)
Harden runner infrastructure — ephemeral runners, container isolation, security hardening for fork PR execution
Cut CI time — right-size eval suites, deduplicate server startups, separate PR smoke tests from nightly full runs
Improve developer experience — faster feedback, clearer failure messages, workflow orchestration

Benefits

We offer competitive base with meaningful equity, comprehensive health benefits, and flexible work arrangements.
Compensation is determined by location, level, and experience.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume