Senior Site Reliability Engineer, Currents

BrazeAustin, TX
10hHybrid

About The Position

We're looking for a Senior Site Reliability Engineer for our Currents team, responsible for building, maintaining, and evolving Currents, our data export system at scale. The Currents system is a robust Kafka-based event pipeline handling tens of billions of messages daily that our customers leverage to analyze user behavior in near real-time. You’ll be a key engineer on a highly collaborative and skilled team, responsible for bringing projects from concept to production and improving our existing high-scale systems. You will be leveraging your experience, your skills, and a strong sense of teamwork to tackle the significant engineering challenges of running a critical data streaming system. As a Senior Site Reliability Engineer, you will specifically focus on the observability, scalability, and reliability strategy aspects of every project. Specific examples of what this translates to: Solve live performance and reliability issues and prevent their recurrence Write and review code, educating engineers and building a culture of reliability Practice sustainable incident response and blameless postmortems Define and enable standards for monitoring, reliability, and performance Bridge the gap between infrastructure and platform engineering teams Support and improve services by planning for scale and reliability Guide junior engineers in SRE best practices, software engineering, and agile project leadership

Requirements

  • Bachelor’s in Computer Science, Software Engineering, or a related STEM field
  • Five (5) years of experience in any role/occupation/position involving software engineering or site reliability engineering
  • Using distributed systems to deploy and monitor live applications such as Kubernetes or Docker Swarm
  • Working with alerting software (Sentry, Datadog, and/or PagerDuty)
  • Utilizing programming languages (Java, Kotlin, and/or Ruby) to understand and contribute to the codebase
  • Storing data in relational and non-relational databases such as Postgres and MongoDb
  • Data streaming or queuing systems to build data pipelines with technologies like Kafka, Sidekiq or SQS and SNS
  • Leveraging continuous integration tools such as Jenkins or Buildkite
  • Collaborating with engineers through pull requests and code reviews in version control software such as GitHub or GitLab

Responsibilities

  • Solve live performance and reliability issues and prevent their recurrence
  • Write and review code, educating engineers and building a culture of reliability
  • Practice sustainable incident response and blameless postmortems
  • Define and enable standards for monitoring, reliability, and performance
  • Bridge the gap between infrastructure and platform engineering teams
  • Support and improve services by planning for scale and reliability
  • Guide junior engineers in SRE best practices, software engineering, and agile project leadership

Benefits

  • Competitive compensation that may include equity
  • Retirement and Employee Stock Purchase Plans
  • Flexible paid time off
  • Comprehensive benefit plans covering medical, dental, vision, life, and disability
  • Family services that include fertility benefits and equal paid parental leave
  • Professional development supported by formal career pathing, learning platforms, and a yearly learning stipend
  • A curated in-office employee experience, designed to foster community, team connections, and innovation
  • Opportunities to give back to your community, including an annual company-wide Volunteer Week and donation matching
  • Employee Resource Groups that provide supportive communities within Braze
  • Collaborative, transparent, and fun culture recognized as a Great Place to Work®
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service