Sr. Site Reliability Engineer

Applied Systems, Inc.
Remote

About The Position

Applied Systems is transforming the insurance industry by building a team dedicated to learning, innovation, and delivering indispensable software and services to customers. With over 40 years of experience in insurtech, the company aims to redefine what's achievable and create a workplace where career growth is fostered. The Senior Site Reliability Engineer will join the SRE team, playing a critical role in ensuring the reliability, scalability, and efficiency of software applications to deliver best-in-class services to insurance agencies and carriers, empowering them to streamline operations, improve customer experiences, and drive business growth.

Requirements

  • 5+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
  • Strong foundations in the areas of Incident Management, Troubleshooting, Observability of software applications
  • Experience with cloud platforms (GCP, AWS, Azure), including traffic management solutions
  • Familiarity with distributed systems, microservices architecture, and related technologies
  • Proficiency in Python, Go, Bash, and PowerShell
  • Expertise in Windows and Linux system administration
  • Advanced knowledge of IaC tools like Terraform, including Terraform CDK with TypeScript, Packer, and HCL
  • Knowledge of CI/CD pipelines and version control systems (GitLab, GitHub Actions, etc.)
  • Familiarity with monitoring tools (Datadog) and security solutions (HashiCorp Vault, Cloud Armor)
  • Experience with SQL Server and PostgreSQL for database management
  • Kubernetes expertise, including Helm charts and ArgoCD for application deployment and orchestration
  • Excellent communication skills to collaborate with engineers, product managers, and business stakeholders
  • Strong organizational skills and attention to detail
  • Ability to prioritize tasks and make accurate decisions under pressure
  • Passion for mentoring and guiding team members

Responsibilities

  • Develop and maintain IaC using Terraform, Terraform CDK with TypeScript, Packer, and Ansible to automate on-prem and cloud infrastructure provisioning and management
  • Collaborate with development and platform teams to design scalable, reliable systems with fault tolerance, high availability, and performance optimization
  • Implement and manage monitoring solutions using Datadog to ensure system performance, tracing instrumentation, and adherence to SLI/SLO/SLAs
  • Utilize HashiCorp Consul for service discovery, dynamic configuration, and network automation across distributed systems
  • Define and implement best practices for disaster recovery and high availability across hybrid environments
  • Build and maintain CI/CD pipelines using tools like GitLab and GitHub Actions to streamline deployments and ensure code quality
  • Automate repetitive tasks to increase efficiency and reduce human error, leveraging tools like Python, Go, Bash, and PowerShell
  • Manage Kubernetes environments, including Helm charts and ArgoCD for application deployment and orchestration
  • Mentor junior engineers, lead technical discussions, and collaborate across teams to drive consensus on design decisions and technical initiatives
  • Create and maintain accurate documentation for workflows, procedures, and infrastructure standards to support internal teams and customers
  • Participate in the on-call rotation to provide production support and resolve complex engineering challenges
  • Work with third-party vendors to evaluate and integrate their products and services into the infrastructure ecosystem

Benefits

  • Medical, Dental, and Vision Coverage
  • Holiday and Vacation Time
  • Health & Wellness Days
  • A Bonus Day for Your Birthday
  • additional compensation plans such as bonus and commission
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service