Senior Software Engineer - SRE

Socure•Carson City, NV

1d•$160,000 - $180,000

About The Position

Socure is building the identity trust infrastructure for the digital economy — verifying 100% of good identities in real time and stopping fraud before it starts. The mission is big, the problems are complex, and the impact is felt by businesses, governments, and millions of people every day. We hire people who want that level of responsibility. People who move fast, think critically, act like owners, and care deeply about solving customer problems with precision. If you want predictability or narrow scope, this won’t be your place. If you want to help build the future of identity with a team that holds a high bar for itself — keep reading. We are hiring exceptional Site Reliability Engineers who take pride in building and operating mission-critical, production-grade systems. This role is for engineers who own what they build, thrive in high-pressure environments, and continuously raise the reliability and operational bar. You will work at the intersection of cloud infrastructure, Kubernetes, automation, and observability, with a strong focus on preventing incidents rather than reacting to them.

Requirements

Deep AWS expertise - networking, compute, IAM, scaling, security
Strong experience managing infrastructure using Terraform at scale
Very strong Kubernetes fundamentals (internals, scheduling, networking, storage)
Hands-on experience operating Amazon EKS in production environments
Experience troubleshooting complex, multi-layer Kubernetes issues
Ability to write clean, maintainable, production-quality code in: Go/ Python
Strong automation mindset — eliminating toil through code
Proven experience building and operating CI/CD pipelines
Hands-on experience with GitHub (Actions or integrations)
Hands-on experience with ArgoCD and GitOps-based deployment workflows
Strong understanding of observability principles: metrics, logs, traces, and alerting
Hands-on experience with Datadog or similar tool for infrastructure and Kubernetes monitoring
Hands-on experience with Datadog or similar tool for application performance monitoring (APM)
Hands-on experience with Datadog or similar tool for alerting, dashboards, and incident detection
Experience defining and using SLIs/SLOs to drive reliability decisions
Ability to turn observability data into actionable operational improvements

Responsibilities

End-to-end ownership of highly available, scalable AWS infrastructure
Design, operation, and continuous improvement of Kubernetes (EKS) platforms
Reliability of production systems through strong observability, automation, and SLOs
CI/CD systems that enable safe, fast, and repeatable deployments
Infrastructure defined and enforced through Terraform and GitOps
Incident response, root cause analysis, and long-term remediation
Raising operational standards through automation, documentation, and best practices

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume