Senior Site Reliability Engineer

Apply

Attain

Posted:

August 14, 2023

Hybrid

Job Commitment

Full-time

Experience Level

Senior

Workplace Type

Hybrid

Job Function

Dev & Engineering

This job is closed

We regret to inform you that the job you were interested in has now been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

About the position

As a Senior Engineer on the SRE team at Attain, you will be responsible for building and maintaining the infrastructure that powers the company's systems. Your role will involve collaborating with various engineering teams to ensure optimal system performance and scalability. The ideal candidate for this position is someone who is comfortable wearing multiple hats, has a strong desire to automate processes, and is eager to learn and teach in a fast-paced environment. Preferred qualifications include experience with cloud-native infrastructure, containerization technologies, database and event streaming technologies, serverless computing technologies, infrastructure-as-code tools, observability tools, and strong computer science and software engineering fundamentals.

Responsibilities

Build and maintain the infrastructure that powers all systems and supporting systems
Ensure that systems are running smoothly and operating at peak efficiency
Collaborate with engineering teams to handle future growth and scale
Wear multiple hats and be willing to learn and teach in a fast-paced environment
Automate processes and tasks
Provide constructive feedback and seek feedback to improve
Work with containerization technologies such as Docker, Kubernetes, Istio, ECS, AWS App Mesh, and Google Cloud Run
Work with database and event streaming technologies such as MySQL, Redis, Google BigQuery, Google Spanner, and Kafka
Work with serverless computing technologies such as AWS Lambda and Google Cloud Functions/Google Cloud Run
Use infrastructure-as-code tools such as Terraform
Utilize observability tools such as Datadog, Prometheus, and Grafana
Apply strong computer science and software engineering fundamentals

Requirements

4+ years of experience building and maintaining large-scale cloud-native infrastructure (AWS and/or GCP)
Experience working with containerization technologies such as Docker, Kubernetes, Istio, ECS, AWS App Mesh, and Google Cloud Run
Experience with database and event streaming technologies such as MySQL, Redis, Google BigQuery, Google Spanner, and Kafka
Experience with serverless computing technologies such as AWS Lambda and Google Cloud Functions/Google Cloud Run
Experience with infrastructure-as-code tools such as Terraform
Experience with observability tools such as Datadog, Prometheus, and Grafana
Strong computer science and software engineering fundamentals

Benefits

Working with cutting-edge infrastructure technologies such as AWS and GCP
Opportunity to work with containerization technologies like Docker, Kubernetes, Istio, ECS, AWS App Mesh, and Google Cloud Run
Experience with database and event streaming technologies such as MySQL, Redis, Google BigQuery, Google Spanner, and Kafka
Exposure to serverless computing technologies like AWS Lambda and Google Cloud Functions/Google Cloud Run
Familiarity with infrastructure-as-code tools such as Terraform
Experience with observability tools like Datadog, Prometheus, and Grafana
Collaboration with engineering leads on critical functionality monitoring
Automation implementation to reduce reliance on manual processes
Participation in architecture design and capacity planning discussions
Building, maintaining, and improving CI/CD pipelines
Writing Terraform modules for deploying infrastructure resources
Developing Helm charts for deploying services and jobs in Kubernetes cluster
Defining metrics, network policies, and routing rules for Istio service mesh
Monitoring and maintaining GCP BigQuery and Spanner databases
Utilizing Google-managed Prometheus instance for metrics and building Grafana dashboards and alerts
Opportunity to experiment with GCP offerings, 3rd party vendors, and open-source tools for automation and security