Senior Software Engineer, Back End Do you love building at the intersection of infrastructure and artificial intelligence? Do you enjoy solving complex distributed systems problems in a fast-paced, collaborative, and iterative environment? At Capital One, you'll be part of a big group of makers, breakers, doers, and disruptors who love to solve real problems. We are seeking specialized Backend Software Engineers who are passionate about building the engines that power the next generation of AI. As a Software Engineer on the AI Training Platform team, you won’t just be building applications; you will be building the foundational Managed Services, SDKs, and Compute Infrastructure that allow our Data Scientists to train massive models across hundreds of distributed GPUs. You will be on the forefront of driving a major machine learning transformation within Capital One. What You’ll Do Build the Platform: Design and develop the control plane and managed services that orchestrate complex AI training workloads across large-scale GPU clusters. Empower Data Scientists: Build intuitive SDKs and CLIs that abstract the complexity of distributed computing, allowing data scientists to focus on modeling rather than infrastructure. Master Distributed Systems: Solve hard problems related to job scheduling, resource allocation, and fault tolerance across hundreds of distributed GPUs using tools like Kubernetes and Ray. Optimize Performance: Debug and optimize the training stack, from the network layer (NCCL, MPI) to the framework level (PyTorch), ensuring high utilization of expensive GPU resources. Collaborate & Innovate: Partner with Machine Learning Engineers to understand their pain points and deliver robust cloud-based solutions. Stay on top of HPC trends, experiment with new orchestration patterns, and mentor others in the engineering community. Tech Stack: Utilize programming languages like Python, Go, and C++, alongside Container Orchestration services (Docker, Kubernetes), GPU hardware (Nvidia), and AWS cloud infrastructure.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Number of Employees
5,001-10,000 employees