Sr. Staff SRE

ArcherSan Jose, CA
1d

About The Position

Archer is an aerospace company based in San Jose, California building an all-electric vertical takeoff and landing aircraft with a mission to advance the benefits of sustainable air mobility. We are designing, manufacturing, and operating an all-electric aircraft that can carry four passengers while producing minimal noise. Our sights are set high and our problems are hard, and we believe that diversity in the workplace is what makes us smarter, drives better insights, and will ultimately lift us all to success. We are dedicated to cultivating an equitable and inclusive environment that embraces our differences, and supports and celebrates all of our team members. Sr Site Reliability Engineer We are seeking a highly experienced and passionate Sr Staff Site Reliability Engineer (SRE) to join our growing team. In this critical role, you will be responsible for the reliability, scalability, performance, and security of our core systems and services. You will leverage your extensive expertise in various technologies to design, implement, and maintain robust infrastructure and automation solutions.

Requirements

  • 3+ years of experience in Site Reliability Engineering, DevOps, or a similar role with a strong focus on operational excellence.
  • Deep expertise in Amazon EKS, including cluster provisioning, management, and troubleshooting.
  • Extensive experience with observability tools and practices, including Prometheus, Grafana, ELK stack, or similar.
  • Proven track record in designing and implementing robust data pipelines (e.g., Kafka, Airflow, Spark).
  • Strong background in CI/CD methodologies and tools (e.g., Jenkins, GitLab CI, ArgoCD).
  • Expert-level knowledge of cloud platforms (AWS preferred), including infrastructure-as-code principles.
  • Comprehensive understanding of security best practices for cloud environments, applications, and data.
  • Proficiency in Docker for containerization and orchestration.
  • Advanced scripting and programming skills in Python, Bash, and PowerShell.
  • Solid understanding of networking concepts, distributed systems, and operating systems.
  • Excellent problem-solving, analytical, and communication skills.
  • Ability to work independently and as part of a highly collaborative team.
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

Nice To Haves

  • Experience with other Kubernetes distributions or cloud providers.
  • Familiarity with compliance frameworks (e.g., SOC 2, HIPAA, GDPR).
  • Certifications in AWS, Kubernetes, or other relevant technologies.

Responsibilities

  • Implement and maintain the infrastructure and pipeline required for an internal LLM-powered chat service, potentially leveraging platforms like OpenRouter or similar alternatives.
  • implement and maintain highly available, scalable, and secure cloud-native infrastructure on Amazon Elastic Kubernetes Service (EKS).
  • Develop and implement comprehensive observability strategies, including monitoring, logging, and alerting, to ensure the health and performance of our systems.
  • Architect and optimize data pipelines to ensure efficient and reliable data flow across various platforms.
  • Drive the continuous improvement of our CI/CD pipelines, promoting best practices for automated testing, deployment, and release management.
  • Champion cloud-first strategies, leveraging the full capabilities of cloud platforms for infrastructure, services, and operations.
  • Implement and enforce robust security practices across our infrastructure, applications, and data.
  • Design and maintain Docker-based containerization solutions for our applications.
  • Develop and maintain automation scripts and tools using Python, Bash, and PowerShell.
  • Collaborate with development teams to ensure reliability is built into the software development lifecycle from inception.
  • Troubleshoot complex production issues across various layers of the stack, identifying root causes and implementing preventative measures.
  • Participate in on-call rotations to support production systems.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service