Machine Learning Performance Engineer

Keysight Technologies, Inc.•Loveland, OH

7d•Remote

About The Position

The AI Models and Data Science team at Keysight AI Labs is hiring a ML Performance Engineer to make our training and inference stacks as fast as the math allows. You'll own end-to-end performance: profiling training workloads on multi-GPU clusters, writing custom CUDA kernels and LibTorch C++ extensions for hot paths, and optimizing inference for embedding in production software where every millisecond matters. This role sits at the intersection of ML, systems engineering, and HPC. You'll work directly with MLEs and data scientists driving the modeling work, and with the engineering teams shipping these models into Keysight products.

Requirements

4+ years in ML engineering, performance engineering, or HPC, with substantial production ML experience
Strong Python and C++ — including LibTorch / PyTorch C++ extensions in production
Hands-on experience optimizing both training and inference workloads (not just one)
CUDA experience required — comfortable profiling GPU code with Nsight and reasoning about occupancy, memory hierarchy, and kernel-level tradeoffs
Production deployment experience with ONNX Runtime, TensorRT, or equivalent inference runtimes
Solid software engineering fundamentals: testing, versioning, code review, monitoring
Experience with Docker and container-based deployment

Responsibilities

Profile and optimize training workloads — multi-GPU scaling efficiency, throughput, memory footprint, mixed precision, gradient checkpointing tradeoffs
Profile and optimize inference for low-latency, high-throughput deployment — quantization, graph optimization, kernel fusion, runtime selection
Write custom CUDA kernels and LibTorch (PyTorch C++) extensions to accelerate hot paths in both training and inference
Build and maintain serving infrastructure using ONNX Runtime, TensorRT, and similar — including C++ integration paths for embedding models inside production software
Partner with MLEs and data scientists on perf-aware architecture choices; partner with product engineering on deployment, versioning, and monitoring
Establish performance SLAs and regression tests so models stay fast as they evolve