About The Position

Are you a seasoned engineer with a passion for reliability and scalability? We’re looking for exceptional Software Engineers to join the Reliability team at Roblox. In this pivotal role, you will drive the evolution of our systems, ensuring they meet the highest standards of performance, reliability, and efficiency. You’ll collaborate with cross-functional teams to build robust infrastructure that supports our growth. If you have a track record of solving complex technical challenges, we want to hear from you. Join us in shaping the future of our platform and delivering unparalleled value to our users. At Roblox, our vision is to achieve 1 billion daily active users. We believe this engineer will be instrumental in driving us towards that ambitious goal.

Requirements

  • Experience: you have a BS degree (or equivalent professional experience) in Computer Science or related engineering field with at least 3-4 years of experience with added advantage working in the Site Reliability space in SRE or Software Engineering
  • Passion for systems: You have experience and good habits around building software and tools and getting them adopted. Your system's focus informs a view of code needing to be deeply reliable.
  • You Are: A Partner: You know that the best tools integrate broadly with the tooling ecosystem. You approach partners and processes with curiosity and seek to understand a problem deeply before you start coding.
  • A Coder: you have experience writing common programming languages ( Go, C#, Java…).
  • Self-organized: you're excited about getting in front of complex problems, organizing your work by any means possible; overcome emergent issues and contributing to long-running projects as a part of the team.
  • Problem Solver: you ask the right questions to solve issues within your expertise and you use data to test your theories.
  • Planner - You have experience in large project lifecycles. You have experienced working in sprints, breaking down complex tasks into milestones, and reporting status to keep project scheduling accurate.

Nice To Haves

  • Prior experience developing, deploying and maintaining LLM-based agents or RAG systems in production is a plus.

Responsibilities

  • Create software and libraries that promote fault-tolerance and resilience
  • Design and develop frameworks and tools to support performance testing, chaos experimentation, and improve infrastructure resiliency.
  • Develop and implement performance monitoring and observability services to proactively identify and understand infrastructure issues and platform degradations.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service