About The Position

We’re looking for a Principal Site Reliability Engineer who combines deep technical expertise with strategic vision. This role leads the design, architecture, and delivery of next-generation, highly available, secure, and performant systems across cloud environments. You’ll own large-scale DevOps initiatives—from defining reference architectures to automating complex CI/CD pipelines—and guide both business and engineering teams toward resilient, scalable, and cost-efficient solutions. As the technical authority on cloud operations, automation, and infrastructure security, you’ll set best practices, mentor teams, and drive continuous improvement across AWS, Azure, and emerging technologies.

Requirements

  • 5+ years of production cloud operations experience
  • 5+ years expertise in Linux command line.
  • 5+ years of using Terraform in for automation. Hands on with automation and seeking out opportunities to automate manual processes.
  • 5+ years of strong, hands-on experience building production services in AWS.
  • 5+ years of experience with Kubernetes configuration management.
  • 5+ years of experience working with various databases.
  • 3+ years of experience with Docker and building Docker images.
  • 3+ years of experience working with Artifactory and Jenkins.
  • 3+ years of experience with scripting using Python and Bash
  • Significant experience with CI/CD automation tools such as Bitbucket, Jenkins, Artifactory.
  • Good understanding of java and Python application.
  • Significant experience with configuration management tools such as Ansible, Salt, Puppet, or Chef.
  • Ability to quickly adapt to changing priority in a high-speed work environment.
  • Experience using AI to improve accuracy, efficiency and speed of Cloud Operations, Security and Support tasks.
  • Ability to participate in on-call rotation.

Nice To Haves

  • Network: firewalls, load balancers, routers
  • Log management: ELK Grafana Prometheus

Responsibilities

  • Design, architect, and implement next generation highly available, performant and secure system architecture and automation solutions.
  • Implement, maintain, and improve Continuous Integration and Continuous Delivery environments.
  • Own and lead initiatives to define, design, and implement DevOps solutions which include reference architectures, estimates, and costing.
  • Expert level knowledge and hands on experience managing databases including MySQL, MongoDB, Postgres, and RDS.
  • Advise business and technology delivery leadership on how to translate the client’s infrastructure and automation business requirements into executable technological solutions.
  • Participate in customers’ workshops and provide presentations of the proposed solution.
  • Act as a subject matter expert on DevOps best practices as related to AWS and Azure.
  • Perform analysis best practices and emerging concepts in DevOps, Infrastructure Automation, and Enterprise Security.
  • Acts as a technical liaison between clients, service engineering teams and support.
  • Review and audit of existing solutions, design and system architecture and participate in yearly security audits.
  • Advise on best practices in Monitoring, alerting, and troubleshooting.
  • Create technical documentation and mentoring of team members.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service