ML Engineer II (Inference Platform)

Beatdapp•Vancouver, BC

About The Position

Beatdapp is seeking an ML Engineer II specializing in Inference Platforms to join their team. This role is crucial for building and scaling machine learning inference systems for audio at Beatdapp, a company focused on streaming integrity technology. The position involves bridging the gap between raw audio data and the clean signals required for detection models. You will collaborate with data scientists to bring models into production, ensuring considerations for latency, cost, and customer-facing aspects are integrated early in the design process. The work encompasses GPU-bound inference containers, multi-cloud infrastructure, API layers, data and observability, and CI/CD pipelines. The primary architectural challenges involve containing drift and scaling systems efficiently with minimal code. This role requires strong engineering judgment, a focus on clean and scalable design, code hygiene, and the ability to advocate for optimal solutions. Roadmaps are short, and the scope of work is dynamic, adapting to the team and system evolution.

Requirements

Related STEM degree (BSc, MSc, or higher).
3+ years of work experience in platform/infra/backend/ML/applied-ML/data engineering.
Strong engineering skills with the ability to write clean, scalable, production-grade code in Python or a performance-oriented language (Go, Rust, C++).
Architectural fluency across data stores, distributed systems, caching, and data transfer protocols.
Data engineering skills, including comfort building data processing pipelines and using SQL (experience with Airflow, BigQuery, Postgres is a plus).
Deep cloud infrastructure and networking experience across one or more platforms (GCP, AWS).
Familiarity with ML platform tooling (e.g., MLflow) and model lifecycle processes (model versioning, artifact storage, promotion workflows).
Proficiency with Terraform, including writing and modifying modules, understanding state and backends, and Infrastructure as Code (IaC) principles.
CI/CD discipline, including cloud OIDC, image signing, pinned versions, and a focus on cost-effective and reproducible CI.
Observability instincts, including comfort with instrumenting across hardware, application, and model layers and knowing which metrics to monitor.
Experience with inference performance tuning for high-throughput GPU services, including micro-batching, concurrency, request queueing, and in-container resource management.
Strong written communication skills for documentation and collaboration.

Nice To Haves

Hands-on work experience with audio or media systems.
Experience with signal processing.
Experience with speech detection (synthetic/artificial).
Experience with computer vision.
GPU work beyond running inference (e.g., CUDA, kernels, drivers, cluster operations).
Experience with streaming systems (e.g., Kafka, Pub/Sub, Kinesis).

Responsibilities

Build, tune, and ship inference containers, including managing Dockerfiles, dependencies, image size, cold starts, and GPU access patterns.
Manage the multi-cloud orchestration for inference containers using platforms like ECS, Cloud Run, GKE, and EKS.
Ensure test coverage for the container surface and manage the underlying storage abstraction.
Optimize in-container performance and resource utilization, including concurrency tuning, VRAM accounting, request timeouts, queueing, rate limiting, and multi-GPU distribution.
Make informed decisions on instance right-sizing based on performance tuning.
Build and execute scale and stress testing scenarios to characterize latency-vs-throughput curves and identify breaking points.
Translate testing results into decisions for autoscaling and instance sizing.
Operate the Terraform stack across multiple clouds (GCP, AWS), managing networking, identity, GPU nodes, autoscaling, and per-tenant configurations.
Build and extend the customer-facing API layer, handling client authentication, rate limiting, data isolation, and request metering.
Maintain and extend data orchestration pipelines for model evaluation, customer reporting, and operational dashboards.
Build and tune metrics, dashboards, logging, and alarms across the inference service, running instances, and deployed models.
Instrument across hardware, application, and model layers to monitor latency, throughput, score distributions, and drift.
Develop runbooks, design documents, PR descriptions, and postmortems, maintaining ticket hygiene in Jira.