Sr. ML Engineer

Visa•Austin, TX

1d•$130,700 - $202,300•Hybrid

About The Position

The Sr. ML Engineer is responsible for designing, building, and managing the scalable cloud infrastructure that powers our AI and Machine Learning applications. Rather than focusing primarily on model building, this role is suited for a specialist with deep expertise in MLOps, AWS cloud architecture, Kubernetes, and system design. You will own key modules of the ML platform, perform architectural reviews, and implement robust deployment standards. The team is tasked with building secure, scalable pipelines and serving infrastructure for both traditional ML and modern Generative AI (LLM) workloads. The successful candidate will act as a design authority for model deployment, infrastructure automation, and platform security, shaping best practices to enable our data scientists and AI engineers to seamlessly transition models from research to production. All roles require digital fluency, including the ability to work with emerging technologies and AI-assisted tools - such as AI coding assistants (e.g., GitHub Copilot, ChatGPT, Claude Code, CLine), advanced reasoning GenAI models, and enterprise productivity tools - to enhance engineering productivity and support everyday work.

Requirements

2+ years of relevant work experience and a Bachelor's degree, OR 5+ years of relevant work experience.
Experience in developing and implementing scalable AI/ML models and algorithms.
Experience managing Kubernetes clusters and Kubeflow for ML pipelines.

Nice To Haves

3 or more years of work experience with a Bachelor’s Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD).
Experience in building ML serving infrastructure (vLLM, TensorRT-LLM, KServe, Triton).
Experience in implementing secure architectures (IAM, VPCs, least privilege).
Experience with automating infrastructure with Terraform/CloudFormation.
Experience with developing CI/CD pipelines for ML model deployment.
Experience with implement monitoring tools (CloudWatch, Prometheus, Grafana).
Cloud-agnostic experience welcomed

Responsibilities

Design, build, and maintain scalable, highly available Machine Learning infrastructure on AWS and Visa OnPrem.
Deploy, configure, and manage Kubernetes clusters and Kubeflow to orchestrate complex ML training and deployment pipelines.
Build robust serving infrastructure to productionize machine learning models and Large Language Models (LLMs) using modern serving frameworks (e.g., vLLM, TensorRT-LLM, KServe, Triton).
Design secure platform architectures utilizing AWS IAM (roles, policies, least privilege), VPCs, and security groups to ensure data and model security.
Architect scalable cloud systems and automate infrastructure provisioning using tools like Terraform or AWS CloudFormation.
Develop and maintain automated CI/CD pipelines for model training, testing, and deployment, ensuring seamless continuous integration.
Partner closely with Data Scientists and AI Engineers to understand their compute and tooling needs, reducing friction in the model development lifecycle.
Implement logging, monitoring, and alerting for ML models and underlying infrastructure (e.g., CloudWatch, Prometheus, Grafana) to track system health, model drift, and latency.
Act as a technical guide to modernize legacy deployment pipelines and integrate emerging AI infrastructure technologies.