Staff Site Reliability Engineer

Grindr-posted 4 months ago

Full-time • Mid Level

Chicago, IL

51-100 employees

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

This is a hybrid role based in our Chicago office and will require you to be in person Tuesdays and Thursdays. The Site Reliability Engineering (SRE) team at Grindr is responsible for ensuring our systems are stable, performant, and scalable as we continue to grow globally. This role reports directly to the Director of Technical Operations and plays a critical part in keeping our infrastructure running reliably while supporting both backend and operations teams. By driving improvements in automation, incident response, and performance optimization, this position ensures Grindr can deliver a safe, reliable, and seamless experience to millions of users worldwide. The team’s work directly impacts uptime, efficiency, and overall system resilience, supporting Grindr’s broader roadmap of building a secure and high-performing platform for the LGBTQ+ community.

Set up and maintain monitoring systems to track the health and performance of applications and infrastructure.
Create and manage alerting mechanisms to detect and respond to issues quickly.
Handle incidents and outages, working to resolve them swiftly and minimize downtime.
Perform root cause analysis to prevent future occurrences and improve system resilience.
Develop tools and scripts to automate repetitive tasks, such as deployments, monitoring, and scaling.
Analyze system performance and identify bottlenecks or areas for improvement.
Work with development teams to optimize code and infrastructure for better performance and resource utilization.
Plan for future growth by analyzing current usage trends and forecasting resource needs.
Define and measure SLOs and SLAs to set expectations for system reliability and performance.
Conduct post mortems to document what went wrong and how to prevent similar incidents in the future.
Work closely with software developers to integrate reliability and performance into the development process.
Ensure that systems are secure and compliant with relevant regulations and standards.
Continuously look for ways to improve system reliability, performance, and efficiency.
Participate in an on-call rotation.

5+ years of experience in site reliability including incident response, incident management, automation and performance optimization.
5+ years of experience in cloud platforms (AWS preferred).
4+ years of experience working with DevOps technologies such as Docker, Kubernetes, Helm, and Terraform.
4+ years developing and maintaining CI/CD pipelines.
4+ years experience using a scripting language like python or bash.
Experience coding in Kotlin or another JVM language is a plus.

Proficient in at least one programming language (e.g., Python, Go, Java).
Strong knowledge of Linux/Unix systems.
Experience with cloud platforms (e.g., AWS, GCP, Azure).
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
Understanding of networking concepts and protocols.
Experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK stack).
Ability to implement and manage CI/CD pipelines.
Knowledge of infrastructure as code (e.g., Terraform, Ansible).
Proficiency in automated testing and deployment practices.
Understanding of SRE principles and practices, including SLAs, SLOs, and SLIs.
Knowledge of security best practices and compliance standards.
Experience with vulnerability assessment and mitigation.
Proven track record of maintaining high availability and performance in production environments.
Experience with incident management and post-mortem analysis.
Ability to optimize system performance and resource utilization.

Insurance premium coverage for health, dental, and vision for you and partial coverage for your dependents.
Generous 401K plan with 6% match and immediate vest in the U.S.
Industry-competitive compensation and eligibility for company bonus and equity programs.
Industry-leading gender-affirming offerings with up to 90% cost coverage, access to Included Health, monthly stipends for HRT, and more.
Flexible vacation policy, monthly stipends for cell phone, internet, wellness, food, and commuting, breakfast/lunch provided onsite, and yearly travel & leisure stipend.

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Site Reliability Engineer Resume Examples

•

Site Reliability Engineer Cover Letter Examples

Staff Site Reliability Engineer

Job Search Resources

Tools

Career Hubs

Guides

Company