Staff Site Reliability Engineer

Core ScientificAustin, TX

About The Position

We are seeking a capable, motivated generalist who thrives in a change-controlled, compliant environment and enjoys working across hybrid cloud and on-premises systems. This role partners closely with application architecture and peer engineering teams while contributing hands-on across platform engineering, DevOps, and SRE. This position is expected to take ownership of complex technical initiatives and see them through to completion—balancing hands-on implementation with effective delegation and cross-team coordination.

Requirements

  • Bachelor's degree in Computer Science or a related field, 7+ years of experience, or equivalent demonstrated impact in SRE, DevOps, or Infrastructure Engineering
  • Broad technical experience across infrastructure and distributed systems, with the ability to design effective solutions, apply appropriate patterns, and anticipate scaling, reliability, and operational challenges
  • Strong understanding of distributed systems behavior, including application runtime characteristics, service-to-service communication, networking, and failure modes in production environments
  • Experience operating in regulated, compliant, or change-controlled environments
  • Experience working in hybrid environments (AWS preferred; on-premises infrastructure required)
  • Strong experience with Infrastructure as Code, configuration management, and orchestration tools (Terraform, Helm, Kustomize, Ansible)
  • Experience with Kubernetes and virtualization technologies
  • Experience with observability platforms (e.g., Datadog), including building monitoring and alerting integrations
  • Experience with build and release systems (e.g., GitHub Actions, Makefiles, Python tooling)

Responsibilities

  • Lead end-to-end delivery of complex technical initiatives, from problem definition and design through implementation, rollout, and operation
  • Own the design, implementation, and reliability of systems across hybrid cloud and on-premises environments
  • Take accountability for technical outcomes, including system reliability, scalability, and performance in regulated, change-controlled environments
  • Drive execution by coordinating work across engineers and teams, delegating effectively while remaining hands-on where needed
  • Partner with application architecture and peer teams to shape system design and influence technical decisions
  • Build, deploy, and operate infrastructure and applications using automation and infrastructure as code
  • Implement secure, immutable infrastructure using modern tooling (e.g., Terraform, Kubernetes, Helm, Ansible)
  • Improve observability, monitoring, and incident response practices
  • Establish and promote best practices for reliability, security, and operational excellence across teams
  • Mentor engineers and contribute to raising the technical bar across the organization
  • Foster open, respectful, and professional communication directly within the team as well as with co-workers/ teammates and leaders across the organization
  • Performs other duties as assigned
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service