Senior Site Reliability Engineer
Attain
·
Posted:
August 14, 2023
·
Hybrid
About the position
As a Senior Engineer on the SRE team at Attain, you will be responsible for building and maintaining the infrastructure that powers the company's systems. Your role will involve collaborating with various engineering teams to ensure optimal system performance and scalability. The ideal candidate for this position is someone who is comfortable wearing multiple hats, has a strong desire to automate processes, and is eager to learn and teach in a fast-paced environment. Preferred qualifications include experience with cloud-native infrastructure, containerization technologies, database and event streaming technologies, serverless computing technologies, infrastructure-as-code tools, observability tools, and strong computer science and software engineering fundamentals.
Responsibilities
- Build and maintain the infrastructure that powers all systems and supporting systems
- Ensure that systems are running smoothly and operating at peak efficiency
- Collaborate with engineering teams to handle future growth and scale
- Wear multiple hats and be willing to learn and teach in a fast-paced environment
- Automate processes and tasks
- Provide constructive feedback and seek feedback to improve
- Work with containerization technologies such as Docker, Kubernetes, Istio, ECS, AWS App Mesh, and Google Cloud Run
- Work with database and event streaming technologies such as MySQL, Redis, Google BigQuery, Google Spanner, and Kafka
- Work with serverless computing technologies such as AWS Lambda and Google Cloud Functions/Google Cloud Run
- Use infrastructure-as-code tools such as Terraform
- Utilize observability tools such as Datadog, Prometheus, and Grafana
- Apply strong computer science and software engineering fundamentals
Requirements
- 4+ years of experience building and maintaining large-scale cloud-native infrastructure (AWS and/or GCP)
- Experience working with containerization technologies such as Docker, Kubernetes, Istio, ECS, AWS App Mesh, and Google Cloud Run
- Experience with database and event streaming technologies such as MySQL, Redis, Google BigQuery, Google Spanner, and Kafka
- Experience with serverless computing technologies such as AWS Lambda and Google Cloud Functions/Google Cloud Run
- Experience with infrastructure-as-code tools such as Terraform
- Experience with observability tools such as Datadog, Prometheus, and Grafana
- Strong computer science and software engineering fundamentals
Benefits
- Working with cutting-edge infrastructure technologies such as AWS and GCP
- Opportunity to work with containerization technologies like Docker, Kubernetes, Istio, ECS, AWS App Mesh, and Google Cloud Run
- Experience with database and event streaming technologies such as MySQL, Redis, Google BigQuery, Google Spanner, and Kafka
- Exposure to serverless computing technologies like AWS Lambda and Google Cloud Functions/Google Cloud Run
- Familiarity with infrastructure-as-code tools such as Terraform
- Experience with observability tools like Datadog, Prometheus, and Grafana
- Collaboration with engineering leads on critical functionality monitoring
- Automation implementation to reduce reliance on manual processes
- Participation in architecture design and capacity planning discussions
- Building, maintaining, and improving CI/CD pipelines
- Writing Terraform modules for deploying infrastructure resources
- Developing Helm charts for deploying services and jobs in Kubernetes cluster
- Defining metrics, network policies, and routing rules for Istio service mesh
- Monitoring and maintaining GCP BigQuery and Spanner databases
- Utilizing Google-managed Prometheus instance for metrics and building Grafana dashboards and alerts
- Opportunity to experiment with GCP offerings, 3rd party vendors, and open-source tools for automation and security