Software Engineer, Machine Learning Platform

Chime Financial, Inc•San Francisco, CA

10h•$187,000 - $259,000•Hybrid

About The Position

Chime’s Machine Learning Platform (MLP) team builds and operates the infrastructure, tooling, and developer experience that powers machine learning across the company. We enable data scientists and ML engineers to develop, train, deploy, and monitor models reliably and efficiently. As a Machine Learning Platform Engineer, you will design and build scalable systems that support model training, feature computation, real-time inference, and experimentation. You’ll work at the intersection of distributed systems, cloud infrastructure, and applied machine learning. This role focuses on building robust foundations that allow ML teams to move quickly while maintaining reliability, governance, and cost efficiency.

Requirements

5+ years of experience in ML infrastructure, platform engineering, or production ML systems
Knowledge of the machine learning model development lifecycle, including data preprocessing, model training, evaluation, and deployment
Experience with distributed systems, cloud computing, or large-scale data processing
Strong foundation in computer science and software engineering principles
Deeply interested in the impact and evolution of advanced AI technologies
Hands-on experience with CI/CD pipelines, DevOps practices, and infrastructure as code
Experience with containerization technologies such as Docker and Kubernetes, and orchestration systems
Knowledge of cloud platforms such as AWS and distributed computing frameworks such as Spark and Ray
Experience with GPU programming(CUDA) and GPU costs/optimization
Strong programming skills in Python, Go, Scala, Java or similar languages
Familiarity with infrastructure-as-code (e.g., Terraform, CloudFormation)
Solid understanding of software engineering fundamentals (testing, version control, code review, observability)

Nice To Haves

Experience with distributed compute frameworks such as Ray
Experience building or operating a feature store
Experience with real-time ML systems or model serving
Familiarity with streaming technologies (Kafka, Kinesis, Flink, Spark Streaming, etc.)
Experience supporting ML lifecycle workflows (training, evaluation, deployment, monitoring)
Knowledge of ML experimentation platforms and model governance practices

Responsibilities

Design, build, and operate scalable ML infrastructure on AWS
Develop distributed training and batch processing systems using Ray
Build and maintain infrastructure-as-code using Terraform
Support and evolve the feature store and feature pipelines
Develop data ingestion and streaming systems (e.g., Kinesis, Kafka, Flink, Spark, or similar technologies)
Improve CI/CD workflows for ML models and platform components
Enhance observability, reliability, and cost visibility across ML workloads
Partner closely with Data Science and ML Engineering teams to improve developer experience
Contribute to platform architecture decisions and technical roadmaps
Participate in on-call rotations to support production systems

Benefits

Bonus
Competitive equity package
401k match
Medical benefits
Dental benefits
Vision benefits
Life benefits
Disability benefits
Generous vacation policy
Company-wide Chime Days
Company-wide paid days off
1% of your time off to support local community organizations
Annual wellness stipend
Up to 24 weeks of paid parental leave for birthing parents
12 weeks of paid parental leave for non-birthing parents
Access to Maven, a family planning tool, with $15k lifetime reimbursement for egg freezing, fertility treatments, adoption, and more
In-office perks including backup child, elder, and/or pet care
Subsidized commuter benefit
In-person and virtual events to connect with your fellow Chimers