About The Position

As a Senior Site Reliability Engineer, you’ll play a critical role in shaping and scaling the infrastructure that powers our platform. You'll work closely with engineering teams to ensure our systems are reliable, secure, and performant with a strong focus on Kubernetes, AWS services, and infrastructure as code. Your expertise will help drive automation, improve developer velocity, and support the continued growth and maintenance of our cloud-native environment.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or a related field — or equivalent practical experience.
  • 5+ years of experience in Site Reliability Engineering, DevOps, or related infrastructure roles.
  • Deep expertise in public cloud platforms, especially AWS, with hands-on experience in services like EC2, S3, Lambda, CloudWatch, and IAM.
  • Strong proficiency with Kubernetes and container orchestration — you’ve run production workloads and understand cluster management, scaling, and troubleshooting.
  • Extensive experience with Infrastructure as Code (IaC) using tools such as Terraform, Pulumi, or Crossplane.
  • Solid scripting or programming skills in languages like Python, Bash, or Go, with a strong focus on automation.
  • Excellent problem-solving and debugging skills, with a systems-thinking mindset.
  • Strong communicator who thrives in collaborative, remote-first teams.

Nice To Haves

  • Working knowledge of managed database services like Amazon RDS, Aurora, or PostgreSQL is a plus — but infrastructure is your main game.

Responsibilities

  • Design, build, and operate scalable, highly available cloud infrastructure primarily on AWS.
  • Manage and evolve our Kubernetes environments to support the deployment and operation of modern, containerized applications.
  • Define and implement Infrastructure as Code (IaC) using tools like Terraform, CDK, or Crossplane.
  • Automate infrastructure provisioning, configuration, maintenance, and monitoring to reduce manual effort and improve reliability.
  • Apply best practices around security, observability, and cost optimization across infrastructure and services.
  • Manage and optimize database technologies, with a focus on Amazon RDS and Aurora.
  • Partner with development teams to
  • Investigate and resolve incidents, perform root cause analysis, and implement long-term fixes.
  • Participate in on-call rotations and provide support for critical production systems.
  • Contribute to SRE best practices, internal tooling, and team knowledge sharing.

Benefits

  • Paxos offers a competitive total compensation and benefits package, including equity and bonuses based on both your individual performance and company performance.
  • Eligibility for bonuses is dependent on job level, and actual salary within the range depends on your skills, experience, and qualifications.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service