Sr. Staff SRE Engineer

Archer AviationSan Jose, CA
37d$163,000 - $204,000

About The Position

What you'll do: Lead the design, implementation, and maintenance of highly available, scalable, and secure cloud-native infrastructure on Amazon Elastic Kubernetes Service (EKS). Develop and implement comprehensive observability strategies, including monitoring, logging, and alerting, to ensure the health and performance of our systems. Architect and optimize data pipelines to ensure efficient and reliable data flow across various platforms. Drive the continuous improvement of our CI/CD pipelines, promoting best practices for automated testing, deployment, and release management. Champion cloud-first strategies, leveraging the full capabilities of cloud platforms for infrastructure, services, and operations. Implement and enforce robust security practices across our infrastructure, applications, and data. Design and maintain Docker-based containerization solutions for our applications. Develop and maintain automation scripts and tools using Python, Bash, and PowerShell. Collaborate with development teams to ensure reliability is built into the software development lifecycle from inception. Troubleshoot complex production issues across various layers of the stack, identifying root causes and implementing preventative measures. Mentor and guide junior SREs, sharing knowledge and fostering a culture of operational excellence. Participate in on-call rotations to support production systems. What you need: 10+ years of experience in Site Reliability Engineering, DevOps, or a similar role with a strong focus on operational excellence. Deep expertise in Amazon EKS, including cluster provisioning, management, and troubleshooting. Extensive experience with observability tools and practices, including Prometheus, Grafana, ELK stack, or similar. Proven track record in designing and implementing robust data pipelines (e.g., Kafka, Airflow, Spark). Strong background in CI/CD methodologies and tools (e.g., Jenkins, GitLab CI, ArgoCD). Expert-level knowledge of cloud platforms (AWS preferred), including infrastructure-as-code principles. Comprehensive understanding of security best practices for cloud environments, applications, and data. Proficiency in Docker for containerization and orchestration. Advanced scripting and programming skills in Python, Bash, and PowerShell. Solid understanding of networking concepts, distributed systems, and operating systems. Excellent problem-solving, analytical, and communication skills. Ability to work independently and as part of a highly collaborative team. Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience. Please note that this job description is intended to provide a general overview of the position and does not include an exhaustive list of responsibilities and qualifications.

Requirements

  • 10+ years of experience in Site Reliability Engineering, DevOps, or a similar role with a strong focus on operational excellence.
  • Deep expertise in Amazon EKS, including cluster provisioning, management, and troubleshooting.
  • Extensive experience with observability tools and practices, including Prometheus, Grafana, ELK stack, or similar.
  • Proven track record in designing and implementing robust data pipelines (e.g., Kafka, Airflow, Spark).
  • Strong background in CI/CD methodologies and tools (e.g., Jenkins, GitLab CI, ArgoCD).
  • Expert-level knowledge of cloud platforms (AWS preferred), including infrastructure-as-code principles.
  • Comprehensive understanding of security best practices for cloud environments, applications, and data.
  • Proficiency in Docker for containerization and orchestration.
  • Advanced scripting and programming skills in Python, Bash, and PowerShell.
  • Solid understanding of networking concepts, distributed systems, and operating systems.
  • Excellent problem-solving, analytical, and communication skills.
  • Ability to work independently and as part of a highly collaborative team.
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

Nice To Haves

  • Experience with other Kubernetes distributions or cloud providers.
  • Familiarity with compliance frameworks (e.g., SOC 2, HIPAA, GDPR).
  • Certifications in AWS, Kubernetes, or other relevant technologies.

Responsibilities

  • Lead the design, implementation, and maintenance of highly available, scalable, and secure cloud-native infrastructure on Amazon Elastic Kubernetes Service (EKS).
  • Develop and implement comprehensive observability strategies, including monitoring, logging, and alerting, to ensure the health and performance of our systems.
  • Architect and optimize data pipelines to ensure efficient and reliable data flow across various platforms.
  • Drive the continuous improvement of our CI/CD pipelines, promoting best practices for automated testing, deployment, and release management.
  • Champion cloud-first strategies, leveraging the full capabilities of cloud platforms for infrastructure, services, and operations.
  • Implement and enforce robust security practices across our infrastructure, applications, and data.
  • Design and maintain Docker-based containerization solutions for our applications.
  • Develop and maintain automation scripts and tools using Python, Bash, and PowerShell.
  • Collaborate with development teams to ensure reliability is built into the software development lifecycle from inception.
  • Troubleshoot complex production issues across various layers of the stack, identifying root causes and implementing preventative measures.
  • Mentor and guide junior SREs, sharing knowledge and fostering a culture of operational excellence.
  • Participate in on-call rotations to support production systems.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Transportation Equipment Manufacturing

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service