Senior Site Reliability Engineer

Foundation Finance Company, LLCRothschild, WI
3d$120,000 - $125,000Remote

About The Position

The Senior Site Reliability Engineer (Sr. SRE) ensures the reliability, performance, and scalability of all platform services. This role combines software engineering and operations expertise to build resilient systems, automate manual work, improve observability, and reduce operational risk. The Sr. SRE partners closely with DevOps, Release Engineering, and Security to embed reliability practices into every stage of the software lifecycle.

Requirements

  • Bachelor’s degree in computer science, engineering, or related field and minimum 5 years’ experience in Site Reliability Engineering, DevOps, or Systems Engineering roles.
  • Experience with AWS (multi-account), Terraform, Ansible, CI/CD systems (GitHub Actions, Bitbucket, Jenkins, AWS Codebuild, AWS Code Pipeline) and observability platforms (New Relic, CloudWatch as well as background with containers (ECS/Fargate/EKS) and resilient architectures required.

Nice To Haves

  • AWS Certified DevOps Engineer or Solutions Architect.
  • Kubernetes or container certification.
  • SRE/DevOps practitioner certifications.

Responsibilities

  • Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical services.
  • Lead capacity planning, performance tuning, design tuning and availability reviews with product and engineering teams to evaluate and develop system health. Run regular Game Day and disaster recovery exercises to validate platform resilience.
  • Eliminate manual “click-ops” by automating infrastructure provisioning, patching, and runtime hygiene using Terraform, Ansible, and CI/CD pipelines.
  • Develop tooling to enforce compliance (SOC2, CIS benchmarks) across environments.
  • Serve as Tier-2 escalation during incidents to quickly and effectively solve problems and lead technical deep dives and root cause analysis.
  • Partner with the Incident Response to improve playbooks, on-call rotations, and postmortems. Reduce mean time to detect (MTTD) and mean time to recovery (MTTR) through automation and proactive engineering.
  • Embed security best practices in system design (IMDSv2, hardened golden images, secret rotation). Work with Security to maintain least-privileged IAM policies, patch compliance, and audit readiness.
  • Identify sources of toil and drive automation to eliminate repetitive manual tasks. Contribute to platform “blueprints” and self-service modules so development teams can operate within reliable guardrails.
  • Measure system performance and track and publish reliability metrics to leadership; use data to drive iterative improvements, minimize risk, and push system capabilities forward.
  • Other duties as assigned by management.
  • Must be able to come to work promptly and regularly.
  • Must be able to take direction and work well with others.
  • Must be able to work under the stress of deadlines.
  • Must be able to concentrate and perform accurately.
  • Must be able to react to change productively.

Benefits

  • Day-one Health Benefits (medical, dental, vision, and flexible spending options like HSA or FSA accounts).
  • 401(k) with company match enrollment on day-one.
  • Paid, Sick and Volunteer Time Off
  • Paid Parental Leave Options
  • Employer Paid Life and Disability
  • Wellbeing on Demand Program
  • Flexible Work Environment with a casual dress code
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service