Site Reliability Engineer

GallupSan Francisco, CA
1d$150,000 - $200,000Hybrid

About The Position

Build Gallup's observability foundation and shift how we detect, respond to and prevent system issues before they affect customers. As a founding member of Gallup’s new site reliability engineering team, you’ll define and scale our observability strategy across engineering and bring reliability engineering principles — automation, observability and continuous improvement — to everything we build. You’ll unify different teams’ monitoring solutions into a cohesive, proactive approach, consolidate our tooling, build automated workflows and establish processes that help us catch problems before they become incidents. In this role, you’ll shape Gallup’s global technology platform to ensure the systems delivering analytics and insights to millions remain fast, resilient and always available. If you’re eager to drive resilience in systems that empower people and organizations worldwide, this is your opportunity — apply today.

Requirements

  • Bachelor's degree in computer science, MIS or a related field, or equivalent experience, required
  • At least three years of experience in site reliability engineering, DevOps or infrastructure roles with a focus on monitoring and observability required
  • Experience with observability and monitoring tools such as Dynatrace (preferred), Datadog, Grafana or similar platforms required
  • Experience with incident management tools like PagerDuty or similar alerting systems required
  • Strong understanding of AWS cloud infrastructure and how to monitor distributed systems required
  • Experience integrating monitoring and alerting systems with collaboration platforms like Slack required
  • Ability to work with application teams across multiple languages and frameworks (e.g., Java, .NET, Python) required
  • Knowledge of metrics, logging and tracing as pillars of observability required
  • Experience writing scripts or automation (e.g., Python, Bash, PowerShell) to support monitoring workflows required
  • A commitment to working on-site at Gallup’s San Francisco office at least three days a week required

Nice To Haves

  • Observability expertise: You've built or scaled monitoring and observability practices, not just maintained existing systems.
  • Tool consolidation experience: You've successfully unified fragmented monitoring solutions across multiple teams.
  • AI mindset: You reduce repetitive operational work through thoughtful automation and workflow design.
  • Incident response leadership: You've designed or improved incident management processes and know how to balance speed with thoroughness.
  • Communication and enablement: You go beyond building dashboards; you guide others in how to instrument their code and interpret metrics.
  • Experience with containerized applications and infrastructure as code preferred

Responsibilities

  • Establish the foundation of Gallup’s SRE function by defining standards, best practices and scalable systems that will grow with the organization
  • Build and evolve observability infrastructure using tools like Dynatrace, Datadog, Grafana and PagerDuty to monitor applications running on AWS
  • Design and implement automated alerting workflows that integrate directly with Slack
  • Establish incident response processes that integrate monitoring, alerting and team communication to reduce recovery time and improve service continuity
  • Create dashboards and metrics that give engineering teams real-time insight into application performance and system reliability
  • Identify opportunities for automation and design self-healing systems in partnership with DevOps engineers
  • Enable end-to-end monitoring and faster issue detection by partnering with application teams to embed observability into Java, .NET and Python services
  • Lead initiatives that help engineering teams adopt and use observability tools effectively
  • Identify patterns in system behavior that indicate potential issues before they affect customers

Benefits

  • Gallup offers a robust benefits package that includes medical, dental, vision, life and other insurance options; a fully vested 401(k) retirement savings plan with company matching; an employee stock ownership program; mass transit reimbursement; family-building benefits; an employee assistance program; and various reimbursements and activities that enhance our associates’ wellbeing.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service