Principal Software Engineer - Storage Cache

RobloxSan Mateo, CA
Onsite

About The Position

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators. At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device. We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there. A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone. Roblox's Cache team is building a next-generation caching solution designed to deliver sub-millisecond average latency, horizontal scalability, and high efficiency—all at a drastically lower cost. Our ultimate vision is to shape a caching infrastructure capable of supporting 1 billion Daily Active Users while reducing costs by 90%. We are turning hours of onboarding and capacity expansion into seconds, freeing service owners entirely from managing cluster lifecycles. As a Principal Engineer on the Cache team (part of the Infra Storage org), you will innovate and operate large-scale, in-house distributed systems to solve Roblox's ever-growing caching challenges. You will report directly to the Engineering Manager for the Cache team.

Requirements

  • A BS degree in Computer Science (or equivalent professional experience) with at least 8+ years of hands-on software engineering experience.
  • Deep domain knowledge in building and operating large-scale distributed systems.
  • A strong builder mindset with proven experience running Active/Active distributed systems on container orchestrators like Kubernetes or Nomad.
  • Strong, hands-on programming experience in Go and C++.
  • Proven success in resolving massive-scale bottlenecks, such as overcoming the limitations of decentralized Gossip protocols or mitigating partial failures in distributed systems.
  • Hands-on experience with modern telemetry and observability stacks (e.g., Prometheus, Grafana, AlertManager, Kibana).

Nice To Haves

  • A track record of contributing to or maintaining major open-source caching projects such as Redis, ValKey, or Memcached.
  • Experience extending cache functionality (e.g., writing custom Redis modules in C/Rust, complex Lua scripting) or deep-tuning underlying memory allocators like jemalloc.
  • Experience with caching proxies (e.g., Twemproxy, Envoy Redis filter) and designing complex, multi-tiered caching architectures.

Responsibilities

  • Lead the architectural transition to a next-generation, multitenant caching service built on ValKey, ensuring strict data, resource, and failure isolation for all tenants.
  • Drive systemic optimizations to mitigate head-of-line blocking, manage hot keys, and maximize CPU and memory utilization across physical machine clusters.
  • Design and build robust frameworks to automate development, chaos testing (fault/latency injection), and monitoring for 24x7 mission-critical services, targeting 99.99%+ availability and elastic scalability.
  • Champion engineering best practices by leading design reviews, performance benchmarking, failure drills, and blameless post-incident retrospectives.
  • Mentor and empower engineers, fostering a culture of deep domain expertise and seamless knowledge sharing across the Storage, Platform, and Product teams.

Benefits

  • equity compensation
  • health insurance
  • dental insurance
  • vision insurance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service