Site Reliability Engineer

The Voleon GroupBerkeley, CA
19d$115,000 - $135,000

About The Position

As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production-critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real-world problems and collaborate with passionate and talented colleagues in an empowering, results-driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.

Requirements

  • Experience with coding and debugging Python
  • Experience with Linux
  • Familiarity with Relational Databases & SQL
  • Sharp analytical and problem-solving skills and a persistent drive to make things work (better)
  • Strong growth mindset and a passion for learning
  • Strong technical communication skills
  • Attention to detail
  • 2 years of relevant industry experience
  • An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience

Nice To Haves

  • Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment
  • Experience supporting production systems
  • Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes

Responsibilities

  • Improve fault-tolerance and maintainability of code in proprietary data pipelines and trading systems
  • Diagnose and fix bugs in code
  • Lead complex deployments
  • Automate manual workflows
  • Track and prioritize outstanding production-related issues
  • Share an on-call rotation responding to incidents to ensure the continuous operation of production-critical systems

Benefits

  • medical
  • dental and vision coverage
  • life and AD&D insurance
  • 20 days of paid time off
  • 9 sick days
  • a 401(k) plan with a company match
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service