Senior Site Reliability Engineer

Latitude AIPalo Alto, CA
15h

About The Position

Latitude AI (lat.ai) develops automated driving technologies, including L3, for Ford vehicles at scale. We’re driven by the opportunity to reimagine what it’s like to drive and make travel safer, less stressful, and more enjoyable for everyone. When you join the Latitude team, you’ll work alongside leading experts across machine learning and robotics, cloud platforms, mapping, sensors and compute systems, test operations, systems and safety engineering – all dedicated to making a real, positive impact on the driving experience for millions of people. As a Ford Motor Company subsidiary, we operate independently to develop automated driving technology at the speed of a technology startup. Latitude is headquartered in Pittsburgh with engineering centers in Dearborn, Mich., and Palo Alto, Calif. Meet the team: As a Site Reliability Engineer on the team, you will be responsible for helping to build and run these mission critical systems. Through the implementation of monitoring and automation, you will constantly ensure the health, reliability, scalability, and performance of the platforms. The Site Reliability team interacts with engineering teams including ingest/data processing, mapping, labeling, triage, machine learning (detection, prediction, tracking), motion planning/control, offline simulation, and release/deployment teams to provide uniform service observability and incident response.

Requirements

  • Bachelor's degree in Computer Engineering, Computer Science, Electrical Engineering, Robotics or a related field and 4+ years of relevant experience (or Master's degree and 2+ years of relevant experience, or PhD)
  • Fundamental understanding of Linux operating system internals, TCP/IP networking, and storage subsystems
  • Hands on development in Go or Python to create robust software that can run reliably in production
  • Strong experience scaling and securing services in the cloud (AWS, GCP) or cloud native environments
  • Experience using infrastructure-as-code principles to automate the creation of infrastructure resources (e.g. Terraform, CloudFormation)
  • Experience authoring and maintaining Kubernetes Controllers in Go
  • Experience running Kubernetes and related core components in a large-scale, production environment
  • Experience with metrics (e.g. Prometheus), logging (e.g. Elasticsearch, Loki) and tracing (e.g. Jaeger, Tempo) systems
  • Understanding of engineering design limitations and ability to provide guidance to teams to scale their services to achieve desired performance within budget
  • A focus on increasing service reliability through defining and adhering to SLOs
  • Strong communication skills and the ability to work effectively in a diverse and distributed team

Responsibilities

  • Build monitoring to ensure our platform is healthy and its reliability measurable
  • Build alerting and a set of runbooks to enable faster detection and remediation of platform issues
  • Debug complex issues that may combine multiple components of the stack and ensure proper fixes are implemented to prevent these issues from happening again
  • Participate in an on-call rotation and culture of continuous improvement through blameless postmortems
  • Design and implement components of the platform to enable features that make the work of our customers possible, simpler and more efficient
  • Build Kubernetes controllers to automate operations

Benefits

  • Competitive compensation packages
  • High-quality individual and family medical, dental, and vision insurance
  • Health savings account with available employer match
  • Employer-matched 401(k) retirement plan with immediate vesting
  • Employer-paid group term life insurance and the option to elect voluntary life insurance
  • Paid parental leave
  • Paid medical leave
  • Unlimited vacation
  • 15 paid holidays
  • Daily lunches, snacks, and beverages available in all office locations
  • Pre-tax spending accounts for healthcare and dependent care expenses
  • Pre-tax commuter benefits
  • Monthly wellness stipend
  • Adoption/Surrogacy support program
  • Backup child and elder care program
  • Professional development reimbursement
  • Employee assistance program
  • Discounted programs that include legal services, identity theft protection, pet insurance, and more
  • Company and team bonding outlets: employee resource groups, quarterly team activity stipend, and wellness initiatives
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service