About The Position

Roblox Storage team powers the data foundation behind every experience on the platform. As a Principal Software Engineer for Data Access , you will architect and build the next‑generation managed OLTP data access layer — engineered for extreme scale, global availability , and uncompromising security. You will design core infrastructure capable of serving hundreds of millions of queries per second while ensuring reliable, low‑latency access to data worldwide. The challenges are large‑scale , highly technical , and business‑critical : you will shape foundational storage systems, evolve platform‑wide scalability and reliability, and provide strong technical leadership that raises engineering quality across teams. This work requires deep innovation, clear architectural vision, and a passion for distributed systems — redefining how Roblox engineers build, access, and trust data at scale.

Requirements

  • Strong experience designing and delivering large-scale distributed systems handling millions of real-time requests per second.
  • Deep data management knowledge in one or more of the following technologies: RDBMS (CockroachDB, SQL Server, Postgres, MySQL, RDB), Caching (Redis), Kafka, KV store (DynamoDB, Cassandra).
  • Strong experience building deployment pipelines on top of container orchestrators like Kubernetes or Nomad and service discovery systems like Consul.
  • Strong experience with programming languages like Rust, Go, Java, or C++.
  • Strong scripting and test automation abilities.
  • Experience with telemetry stacks, like Grafana, Prometheus, AlertManager, and Kibana.
  • BS degree (or equivalent professional experience) in Computer Science, with at least 7 years of hands-on experience.

Responsibilities

  • Partner with Product, Engineering, and Security teams to define long‑term strategy and technical requirements for the Data Access platform.
  • Lead the architecture, implementation, and operation of our storage Infra‑as‑a‑Service offerings, setting the engineering bar for scalability, reliability, and system hardening across teams.
  • Improve and scale our large distributed 24x7 services and deliver features with urgency, cost efficiency, zero down time, and high reliability.
  • Design and build frameworks or tools to automate development, testing, deployment, management, and monitoring of mission critical services.
  • Collaborate with partner teams, producing project work plans, measurable metrics, delivery milestones, rollout plans, oncall alerts, and runbooks while leveraging existing technology stack.
  • Give a high level of attention to creating high quality and reusable code, keeping development continuous without compromising site reliability.
  • Improve SLAs across the platform and reduce end‑to‑end rollout time for critical storage and data-access features.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service