Applied Machine Learning Platform Engineer

Buzz Solutions

45d

About The Position

Buzz is revolutionizing the analytics and maintenance of power grid infrastructure through our advanced AI solutions. Our computer vision systems analyze critical infrastructure to enhance safety, reliability, and operational efficiency across the power grid network. We're looking for an entry/mid-level Applied Machine Learning Platform Engineer to join our computer vision team and help improve the databases, cloud infrastructure, and tooling our team builds on. You'll build tooling and infrastructure to help scale our training and data pipelines. You'll work within a team of experienced ML engineers with the autonomy to drive your own projects and the support to keep growing.

Requirements

2-4 years of industry experience in platform, backend, data, or MLOps engineering roles
Python proficiency — idiomatic code, type hints, async patterns, packaging, and performance-aware implementation
Strong software engineering fundamentals — testing, code review, API design, component-level system design
Hands-on experience building and operating distributed cloud machine learning infrastructure
Designing and maintaining scalable training infrastructure, managing ML platform reliability, optimizing data pipelines for throughput at scale
Experience with database design and data systems for ML workloads — schema design, query optimization, and storage strategies for large-scale datasets
Excels at workflow orchestration and automation
Solid proficiency in Python and core ML tooling:
Python ecosystem: Pytest, UV, FastAPI, Pydantic
Tooling: Git, Docker, UV
Tracking: MLflow, Weights & Biases, or equivalent
Automation: Github Actions, CI/CD, Prefect or equivalent
Infrastructure: AWS, GCP, Kubernetes, Helm, Terraform or equivalent
Databases: postgres, DynamoDB, Bigtable

Responsibilities

Design, build, and maintain scalable training infrastructure for computer vision workloads
Implement and manage distributed training pipelines (multi-GPU, multi-node) to support large-scale model training and hyperparameter tuning
Build and maintain robust data pipelines for ML development
Design database schemas and storage strategies for managing large training datasets, annotations, and model artifacts
Implement and manage feature stores, data versioning, and experiment tracking to support reliable model iteration
Automate existing analysis workflows
Maintain clear documentation for platform components, data contracts, and deployment processes
Communicate infrastructure decisions, tradeoffs, and system limitations clearly to ML engineers and stakeholders
Conduct thorough code reviews and write integration tests for ML pipelines

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume