About The Position

Beatdapp is seeking a Machine Learning Engineer II specializing in Inference Platforms. This role is crucial for building and scaling machine learning inference systems for audio at scale, covering music, podcasts, and speech, with a focus on AI-generated sound. The engineer will bridge the gap between raw audio and the detection models, partnering with data scientists to bring models into production with a production lens (latency, cost, customer-facing edges). The work involves GPU-bound inference containers, multi-cloud infrastructure, API layers, data and observability, and CI/CD. The ideal candidate possesses strong engineering judgment, a feel for clean, scalable design, code hygiene, and the courage to advocate for better approaches.

Requirements

  • Related STEM degree (BSc, MSc, or higher).
  • 3+ years of work experience in platform/infra/backend/ML/applied-ML/data engineering.
  • Ability to write clean, scalable, production-grade code in Python or a performance-oriented language (Go, Rust, C++).
  • Architectural fluency across data stores, distributed systems, caching, and data transfer protocols.
  • Data engineering skills: Comfort building data processing pipelines and using SQL (Airflow, BigQuery, Postgres).
  • Deep cloud infrastructure and networking experience across one or more platforms (GCP, AWS).
  • Familiarity with ML platform tooling (e.g., MLflow) and model lifecycle processes.
  • Terraform experience: ability to write and modify modules, understand state and backends.
  • Observability instincts: comfortable instrumenting across hardware, application, and model layers.
  • Inference performance tuning experience with high-throughput GPU services.
  • Strong written communication skills for runbooks, design docs, PR descriptions, and postmortems.

Nice To Haves

  • Hands-on work experience with audio or media systems.
  • Experience with signal processing.
  • Experience with speech detection (synthetic / artificial).
  • Experience with computer vision.
  • GPU work beyond running inference (CUDA, kernels, drivers, cluster operations).
  • Experience with streaming systems (Kafka, Pub/Sub, Kinesis, or similar).

Responsibilities

  • Build, tune, and ship inference containers, including Dockerfiles, dependencies, image size, cold-starts, GPU access, and multi-cloud orchestration (ECS, Cloud Run, GKE, EKS).
  • Optimize in-container performance and resource utilization on GPUs, including concurrency tuning, VRAM accounting, request timeouts, queueing, rate limiting, and multi-GPU distribution.
  • Build and run scale and stress testing scenarios to characterize performance, identify breaking points, and inform autoscaling and instance-sizing decisions.
  • Operate Terraform stacks across multiple clouds (GCP, AWS), managing networking, identity, GPU nodes, autoscaling, and per-tenant configurations.
  • Build and extend the customer-facing API layer, handling client authentication, rate limiting, data isolation, and request metering.
  • Maintain and extend data orchestration pipelines for model evaluation, customer reporting, and operational dashboards.
  • Build and tune metrics, dashboards, logging, and alarms across inference services, running instances, and deployed models.
  • Ensure CI/CD discipline, including cloud OIDC, image signing, pinned versions, and reproducible CI.
  • Instrument across hardware, application, and model layers for observability (latency, throughput, score distributions, drift).
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service