Principal SRE

ChalkboardNew York, NY

About The Position

Chalkboard is building the future of sports gaming. Our mission is to blur the line between watching and playing by turning real-money sports gaming into a social, immersive experience built for fans who play to win. We're not just creating another betting app. We're reimagining how sports fans engage with the games they love. At our core, we’re a team of sports-obsessed builders who value clarity, fairness, and the thrill of helping fans turn insight into action. We’re looking for a Principal Site Reliability Engineer to join Chalkboard and help us build a platform that is reliable, scalable, and easy for teams to build on. In this role you’ll work closely with Engineering, Product, and Data teams, playing a meaningful part in how millions of fans experience sports in real time. If you’re someone who loves building from scratch, thrives in fast-moving environments, and wants to win as a team—not just an MVP, keep reading!

Requirements

  • Cloud Infrastructure (GCP preferred): networking, IAM, databases, storage
  • Kubernetes: cluster operations and workload management
  • Infrastructure as Code: Terraform, Helm
  • CI/CD: GitHub Actions or similar
  • Observability: metrics, logging, tracing, alerting
  • 8+ years of experience in SRE, platform engineering, or infrastructure roles
  • Strong experience with distributed systems and backend architectures
  • Proven ability to improve system reliability, scalability, and performance
  • Experience building and improving CI/CD pipelines and deployment workflows
  • Strong debugging skills using data (logs, metrics, traces)
  • Experience leading incident response and driving root cause analysis
  • Ability to make pragmatic tradeoffs between speed, reliability, and scale
  • Experience partnering across engineering teams to improve developer velocity

Nice To Haves

  • Experience with Go or backend frameworks like Nest.js
  • Experience with Datadog or similar observability platforms
  • Familiarity with Postgres, MongoDB, Firestore, or Redis
  • Experience with messaging systems like RabbitMQ
  • Experience with GitOps tools (FluxCD, Kustomize)
  • Passion for sports, gaming, or betting products

Responsibilities

  • Own platform reliability end-to-end, proactively identifying and mitigating risks before they impact users
  • Build and evolve observability (metrics, logs, tracing) to enable fast detection, diagnosis, and resolution of issues
  • Scale infrastructure ahead of demand by identifying bottlenecks and implementing durable architecture improvements
  • Reduce developer friction by improving CI/CD pipelines, deployment workflows, and internal tooling
  • Lead incident response and root cause analysis, driving systemic fixes—not just short-term patches
  • Establish and enforce best practices for infrastructure, deployments, and system reliability
  • Build reusable, self-service infrastructure that enables teams to ship quickly and safely
  • Continuously improve systems through automation and Infrastructure-as-Code

Benefits

  • Comprehensive medical, dental, and vision coverage starting within 30 days, with the majority of premiums covered by Chalkboard
  • 401(k) with company match
  • Lunch on us everyday with a corporate DoorDash account
  • Refuel in the office with protein shakes, energy drinks, and a snack buffet
  • Flexible time off policy, plus 10 company holidays, WFH during the holidays

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Principal

Education Level

No Education Listed

Number of Employees

1-10 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service