Site Reliability Engineer

Transform9Birmingham, AL
7h

About The Position

At Transform9, we are dedicated to transforming healthcare access and patient communication through our innovative conversational agent platform. Our mission is to provide seamless experiences for patients and healthcare providers alike. To support our growing platform, we are seeking a Site Reliability Engineer to ensure the health, performance, and reliability of our systems. In this role, you will work collaboratively with the development and operations teams to build and maintain scalable infrastructure, automate processes, and enhance the overall availability of our services. Your expertise will be vital in creating a robust environment that can support our ambitious growth in the healthcare sector.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • 5+ years of experience in site reliability engineering, DevOps, or a related software engineering role.
  • Strong understanding of cloud infrastructure (AWS, Azure, etc.) and container orchestration technologies (Kubernetes, Docker).
  • Experience with infrastructure as code tools (Terraform, Ansible, etc.) for automating deployments.
  • Proficiency in scripting and programming languages such as Python, Go, or Bash.
  • Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack).
  • Excellent problem-solving skills and the ability to work effectively in high-pressure situations.

Responsibilities

  • Design, implement, and maintain scalable and reliable systems to support the Transform9 platform and services.
  • Monitor system performance, respond to incidents, and troubleshoot issues to ensure optimal uptime and reliability.
  • Build and manage CI/CD pipelines to facilitate smooth deployments and automate workflows.
  • Collaborate with development teams to establish best practices in system architecture, deployment, and monitoring.
  • Implement observability solutions to gain insights into system performance and user experience.
  • Participate in on-call rotations to respond to system alerts, perform root cause analysis, and implement remediation strategies.

Benefits

  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Maternity, Paternity)
  • Training & Development
  • Free Food & Snacks
  • Stock Option Plan
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service