Machine Learning Engineer II - Inference Platform

Beatdapp•Vancouver, BC

12d

About The Position

Beatdapp is seeking a Machine Learning Engineer II specializing in Inference Platforms. This role is crucial for building and scaling machine learning inference systems for audio at scale, covering music, podcasts, and speech, with a focus on AI-generated sound. The engineer will bridge the gap between raw audio and the detection models, partnering with data scientists to bring models into production with a production lens (latency, cost, customer-facing edges). The work involves GPU-bound inference containers, multi-cloud infrastructure, API layers, data and observability, and CI/CD. The ideal candidate possesses strong engineering judgment, a feel for clean, scalable design, code hygiene, and the courage to advocate for better approaches.

Requirements

Related STEM degree (BSc, MSc, or higher).
3+ years of work experience in platform/infra/backend/ML/applied-ML/data engineering.
Ability to write clean, scalable, production-grade code in Python or a performance-oriented language (Go, Rust, C++).
Architectural fluency across data stores, distributed systems, caching, and data transfer protocols.
Data engineering skills: Comfort building data processing pipelines and using SQL (Airflow, BigQuery, Postgres).
Deep cloud infrastructure and networking experience across one or more platforms (GCP, AWS).
Familiarity with ML platform tooling (e.g., MLflow) and model lifecycle processes.
Terraform experience: ability to write and modify modules, understand state and backends.
Observability instincts: comfortable instrumenting across hardware, application, and model layers.
Inference performance tuning experience with high-throughput GPU services.
Strong written communication skills for runbooks, design docs, PR descriptions, and postmortems.

Nice To Haves

Hands-on work experience with audio or media systems.
Experience with signal processing.
Experience with speech detection (synthetic / artificial).
Experience with computer vision.
GPU work beyond running inference (CUDA, kernels, drivers, cluster operations).
Experience with streaming systems (Kafka, Pub/Sub, Kinesis, or similar).

Responsibilities

Build, tune, and ship inference containers, including Dockerfiles, dependencies, image size, cold-starts, GPU access, and multi-cloud orchestration (ECS, Cloud Run, GKE, EKS).
Optimize in-container performance and resource utilization on GPUs, including concurrency tuning, VRAM accounting, request timeouts, queueing, rate limiting, and multi-GPU distribution.
Build and run scale and stress testing scenarios to characterize performance, identify breaking points, and inform autoscaling and instance-sizing decisions.
Operate Terraform stacks across multiple clouds (GCP, AWS), managing networking, identity, GPU nodes, autoscaling, and per-tenant configurations.
Build and extend the customer-facing API layer, handling client authentication, rate limiting, data isolation, and request metering.
Maintain and extend data orchestration pipelines for model evaluation, customer reporting, and operational dashboards.
Build and tune metrics, dashboards, logging, and alarms across inference services, running instances, and deployed models.
Ensure CI/CD discipline, including cloud OIDC, image signing, pinned versions, and reproducible CI.
Instrument across hardware, application, and model layers for observability (latency, throughput, score distributions, drift).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume