Senior Lead Machine Learning Engineer

Capital One•San Francisco, NY

4d•$229,900 - $286,200

About The Position

Senior Lead Machine Learning Engineer At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create real-time, personalized customer experiences. Our investments in technology infrastructure and world-class talent — along with our deep experience in machine learning — position us to be at the forefront of enterprises leveraging AI. From informing customers about unusual charges to answering their questions in real time, our applications of AI & ML are bringing humanity and simplicity to banking. We are committed to continuing to build world-class applied science and engineering teams to deliver our industry leading capabilities with breakthrough product experiences and scalable, high-performance AI infrastructure. At Capital One, you will help bring the transformative power of emerging AI capabilities to reimagine how we serve our customers and businesses who have come to love the products and services we build. Team Description : The Intelligent Foundations and Experiences (IFX) team is at the center of bringing our vision for AI at Capital One to life. We work hand-in-hand with our partners across the company to advance the state of the art in science and AI engineering, and we build and deploy proprietary solutions that are central to our business and deliver value to millions of customers. Our AI models and platforms empower teams across Capital One to enhance their products with the transformative power of AI, in responsible and scalable ways for the highest leverage impact. In this role, you will: Design, build, and/or deliver ML models and components that solve real-world business problems, while working in collaboration with a cross-functional team of engineers, research scientists, technical program managers, and product managers.. Leverage or build cloud-based architectures, technologies, and/or platforms to deliver optimized ML models at scale such as AWS Ultraclusters, Huggingface, VectorDBs, PyTorch, and more. Construct optimized data pipelines to feed ML models. Design, develop, test, deploy, and support AI software components including large language model inference, similarity search, model evaluation, experimentation, governance, and observability, etc. Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems. Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One. Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI. The Ideal Candidate: You love to build systems, take pride in the quality of your work, and also share our passion to do the right thing. You want to work on problems that will help change banking for good. Passion for staying abreast of the latest research, and an ability to intuitively understand scientific publications and judiciously apply novel techniques in production. You adapt quickly and thrive on bringing clarity to big, undefined problems. You love asking questions and digging deep to uncover the root of problems and can articulate your findings concisely with clarity. You have the courage to share new ideas even when they are unproven. You are deeply Technical. You possess a strong foundation in engineering and mathematics, and your expertise in hardware, software, and AI enable you to see and exploit optimization opportunities that others miss. You are a resilient trail blazer who can forge new paths to achieve business goals when the route is unknown.

Requirements

Bachelor’s Degree
At least 8 years of experience designing and building data-intensive solutions using distributed computing (Internship experience does not apply)
At least 4 years of experience programming with Python, Scala, or Java
At least 3 years of experience building, scaling, and optimizing ML systems
At least 2 years of experience leading teams developing ML solutions

Nice To Haves

Master's or doctoral degree in computer science, electrical engineering, mathematics, or a similar field
Experience developing, delivering, and supporting ML solutions in a public cloud such as AWS, Azure, or Google Cloud Platform
4+ years of on-the-job experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow
3+ years of experience with data gathering and preparation for ML models
ML industry impact through conference presentations, papers, blog posts, open source contributions, or patents
Experience developing AI and ML algorithms or technologies (e.g. LLM Inference, Similarity Search and VectorDBs, Guardrails, Memory)
Experience developing and applying state-of-the-art techniques for optimizing training and inference software to improve hardware utilization, latency, throughput, and cost
Ability to communicate complex technical concepts clearly to a variety of audiences

Responsibilities

Design, build, and/or deliver ML models and components that solve real-world business problems, while working in collaboration with a cross-functional team of engineers, research scientists, technical program managers, and product managers.
Leverage or build cloud-based architectures, technologies, and/or platforms to deliver optimized ML models at scale such as AWS Ultraclusters, Huggingface, VectorDBs, PyTorch, and more.
Construct optimized data pipelines to feed ML models.
Design, develop, test, deploy, and support AI software components including large language model inference, similarity search, model evaluation, experimentation, governance, and observability, etc.
Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems.
Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One.
Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume