Senior Software Engineer - SRE

SocureCarson City, NV
$160,000 - $180,000

About The Position

Socure is building the identity trust infrastructure for the digital economy — verifying 100% of good identities in real time and stopping fraud before it starts. The mission is big, the problems are complex, and the impact is felt by businesses, governments, and millions of people every day. We hire people who want that level of responsibility. People who move fast, think critically, act like owners, and care deeply about solving customer problems with precision. If you want predictability or narrow scope, this won’t be your place. If you want to help build the future of identity with a team that holds a high bar for itself — keep reading. We are hiring exceptional Site Reliability Engineers who take pride in building and operating mission-critical, production-grade systems. This role is for engineers who own what they build, thrive in high-pressure environments, and continuously raise the reliability and operational bar. You will work at the intersection of cloud infrastructure, Kubernetes, automation, and observability, with a strong focus on preventing incidents rather than reacting to them.

Requirements

  • Deep AWS expertise - networking, compute, IAM, scaling, security
  • Strong experience managing infrastructure using Terraform at scale
  • Very strong Kubernetes fundamentals (internals, scheduling, networking, storage)
  • Hands-on experience operating Amazon EKS in production environments
  • Experience troubleshooting complex, multi-layer Kubernetes issues
  • Ability to write clean, maintainable, production-quality code in: Go/ Python
  • Strong automation mindset — eliminating toil through code
  • Proven experience building and operating CI/CD pipelines
  • Hands-on experience with GitHub (Actions or integrations)
  • Hands-on experience with ArgoCD and GitOps-based deployment workflows
  • Strong understanding of observability principles: metrics, logs, traces, and alerting
  • Hands-on experience with Datadog or similar tool for infrastructure and Kubernetes monitoring
  • Hands-on experience with Datadog or similar tool for application performance monitoring (APM)
  • Hands-on experience with Datadog or similar tool for alerting, dashboards, and incident detection
  • Experience defining and using SLIs/SLOs to drive reliability decisions
  • Ability to turn observability data into actionable operational improvements

Responsibilities

  • End-to-end ownership of highly available, scalable AWS infrastructure
  • Design, operation, and continuous improvement of Kubernetes (EKS) platforms
  • Reliability of production systems through strong observability, automation, and SLOs
  • CI/CD systems that enable safe, fast, and repeatable deployments
  • Infrastructure defined and enforced through Terraform and GitOps
  • Incident response, root cause analysis, and long-term remediation
  • Raising operational standards through automation, documentation, and best practices
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service