About The Position

At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what’s next. The Test Automation Platform team (TAP) provides the core infrastructure and capabilities to enable automated testing of the Netflix product at scale. Our Device & Test Automation platform is used to enable other teams to qualify and validate the Netflix TV, mobile, and web client applications, partner device implementations, mobile games, and more. We view ourselves as a force multiplier for Netflix engineering, providing composable capabilities and pluggable abstractions that allow teams to manage, orchestrate, and analyze their automated tests and devices. Our platform executes and ingests results for over 3 million test executions daily. You will join the Infrastructure & Operations pod within TAP. This pod develops and operates the foundational services and infrastructure that underpin the Test Automation platform, covering service deployment and delivery, core datastores, observability and alerting, CI/CD and developer tooling, and the reliability and efficiency of the platform as it scales.

Requirements

  • You think in terms of platforms and paved paths and like building opinionated, reusable patterns and libraries that other engineers can rely on.
  • You have a strong infrastructure and backend engineering background and enjoy working across services, storage, compute, and operations for large-scale platforms.
  • You have experience working with cloud provider technologies (AWS, GCP, Azure)
  • You have experience with distributed systems fundamentals such as latency, throughput, backpressure, retries, idempotency, and consistency and availability tradeoffs.
  • You have significant experience in at least one backend programming language.
  • You understand networking details of client/server communication and can debug issues involving HTTP, TCP, DNS, etc.
  • You take initiative and drive projects with dedication.
  • You excel in collaborative settings and use your strong communication skills to influence outcomes.

Nice To Haves

  • Experience with MongoDB sharding, indexing, performance tuning, data lifecycle management.
  • Experience with infrastructure for large-scale test or CI systems, including scheduling, queuing, parallel execution, and resource-aware scaling.
  • Experience building or operating data pipelines and ETL from operational data stores into analytics systems such as Iceberg/BDP, data lakes, or data warehouses.
  • Familiarity with resilience engineering practices, including failure injection, DR and multi-region strategies, incident reviews, and Linux systems debugging from the command line.

Responsibilities

  • Design, build, and operate backend services and infrastructure that power the Test Automation platform, with a focus on reliability, scalability, and cost efficiency.
  • Own and evolve core infrastructure components such as service deployment and delivery, platform datastores (MongoDB).
  • Standardize and modernize service infrastructure by moving services onto paved paths for observability, provisioning, capacity management, security.
  • Analyze and optimize critical systems like MongoDB for capacity, performance, and cost, including sharding, version upgrades, and data lifecycle strategies (TTL, archival, hot/cold storage).
  • Improve operational excellence by rationalizing metrics and alert sources, enhancing dashboards and alerts, and building runbooks.
  • Drive resilience and reliability initiatives such as load testing, failure injection testing, disaster recovery and high-availability strategies, and post-incident improvements grounded in cost/benefit tradeoffs.
  • Collaborate with partner teams on paved paths, storage and compute offerings, and ETL/data pipelines.

Benefits

  • Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits.
  • We also offer paid leave of absence programs.
  • Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off.
  • Full-time salaried employees are immediately entitled to flexible time off.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service