About The Position

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services. You will work closely with software engineering, DevOps, and infrastructure teams to build and maintain robust systems that support our mission-critical applications.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
  • 3+ years of experience in Site Reliability Engineering, DevOps, or related roles.
  • Strong knowledge of cloud platforms (AWS, Azure, GCP).
  • Proficiency in scripting and programming languages (Python, Go, Bash, etc.).
  • Experience with containerization and orchestration tools (Docker, Kubernetes).
  • Familiarity with CI/CD tools and practices, including experience with Harness or Jenkins.
  • Expertise in monitoring and logging tools (Prometheus, Grafana, ELK, etc.).
  • Solid understanding of networking, security, and system architecture.

Nice To Haves

  • Experience with Infrastructure as Code (Terraform, CloudFormation).
  • Knowledge of distributed systems and microservices architecture.
  • Strong analytical and problem-solving skills.
  • Excellent communication and collaboration abilities.

Responsibilities

  • Design, implement, and maintain scalable and highly available infrastructure.
  • Develop and maintain monitoring, alerting, and incident response systems.
  • Collaborate with development teams to improve system reliability and performance.
  • Automate operational tasks and improve deployment pipelines.
  • Conduct root cause analysis and postmortems for incidents.
  • Participate in on-call bridges and respond to production incidents.
  • Continuously improve system observability and operational practices.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service