Distinguished Engineer

Capital One•San Jose, CA

About The Position

As a Distinguished Engineer at Capital One, you will be a part of a community of technical experts working to define the future of banking in the cloud. You will work alongside our talented team of developers, machine learning experts, product managers and people leaders. Our Distinguished Engineers are leading experts in their domains, helping devise practical and reusable solutions to complex problems. You will drive innovation at multiple levels, helping optimize business outcomes while driving towards strong technology solutions. At Capital One, we believe diversity of thought strengthens our ability to influence, collaborate and provide the most innovative solutions across organizational boundaries. You will promote a culture of engineering excellence, and strike the right balance between lending expertise and providing an inclusive environment where the ideas of others can be heard and championed. You will lead the way in creating next-generation talent for Capital One Tech, mentoring internal talent and actively recruiting to keep building our community. Distinguished Engineers are expected to lead through technical contribution. You will operate as a trusted advisor for our key technologies, platforms and capability domains, creating clear and concise communications, code samples, blog posts and other material to share knowledge both inside and outside the organization. You will specialize in a particular subject area, but your input and impact will be sought and expected throughout the organization. We are looking for a visionary technologist to anchor our Foundation Model (FM) Hosting team. As generative AI becomes the core engine of our business, the frontier of our success lies in how efficiently, reliably, and rapidly we can serve massive large language models at scale. In this Distinguished-level role, you won't just be using existing tools; you will be pushing the absolute limits of LLM inference physics. You will own the technical strategy for our FM serving stack, bridging the critical gap between our Science teams and our production infrastructure. If you thrive on shaving off milliseconds of latency, writing custom CUDA kernels to bypass hardware bottlenecks, and architecting distributed systems that scale effortlessly on Kubernetes, we want you to lead our next generation of AI infrastructure. If you are ready to provide thought leadership and build engineering excellence across Capital One's engineering teams, come join us in our mission to change banking for good.

Requirements

Bachelor’s Degree
At least 7 years of experience in Software engineering

Nice To Haves

Bachelor's or Master's Degree in Computer Science or a related field
10+ years of experience coding in commonly used languages like Java, Python, Go, JavaScript or TypeScript and Swift.
9+ years of experience in the full lifecycle of system development, from conception through architecture, implementation, testing, deployment and production support
3+ years of experience with public or private cloud technologies
8+ years of experience with Networking (BGP, Wi-Fi, SD-WAN, Cloud Networking and Data Center Networking)
Contributions, active maintainer status, or core authorship in open-source AI infrastructure or serving projects (vLLM, TensorRT-LLM, Hugging Face TGI, Ray, or Triton Inference Server).
Experience in distributed inference communication primitives.
Experience optimizing NCCL, heavily utilizing NVLink/NVSwitch, and tuning network fabrics such as InfiniBand/RDMA for complex Tensor Parallelism (TP) and Pipeline Parallelism (PP) architectures.
Published research or papers at top-tier Machine Learning and Systems conferences such as MLSys, OSDI, SOSP, NeurIPS, or ICML, or hold patents related to distributed systems, model compression, or AI inference scaling.
Experience designing routing and scheduling mechanisms for split-architecture serving or multi-LoRA serving architectures to support thousands of dynamic, personalized model adapters simultaneously.

Responsibilities

Articulate and evangelize a bold technical vision for your domain
Decompose complex problems into practical and operational solutions
Ensure the quality of technical design and implementation
Serve as an authoritative expert on non-functional system characteristics, such as performance, scalability and operability
Continue learning and injecting advanced technical knowledge into our community
Handle several projects simultaneously, balancing your time to maximize impact
Act as a role model and mentor within the tech community, helping to coach and strengthen the technical expertise and know-how of our engineering and product community
Design and drive the long-term technical roadmap for our Foundation Model Hosting platform, ensuring high throughput, ultra-low latency, and optimal GPU utilization across massive, multi-tenant workloads.
Lead performance engineering across both the platform and model layers.
Pioneer the implementation of advanced techniques such as speculative decoding, continuous batching, kv-cache optimization (PagedAttention), and custom quantization strategies (FP8, INT4, AWQ).
Act as the primary engineering counterpart to our AI Research & Science teams.
Co-design model architectures for deployability, ensuring that the latest foundational models seamlessly transition from the lab to highly optimized production environments.
Mentor senior engineers, establish rigorous engineering standards for AI deployment, and foster a culture of uncompromising technical excellence.