About The Position

At Databricks, we are inspired by allowing data teams to solve the world's toughest problems, from security threat detection to cancer drug development. We do this by building and running the world's best data and AI infrastructure platform, so our customers can focus on the high value challenges that are central to their own missions. Our engineering teams build technical products that fulfill real, important needs in the world. We always push the boundaries of data and AI technology, while simultaneously operating with the security and scale that is important to making customers successful on our platform. We develop and operate one of the largest scale software platforms. The fleet consists of millions of virtual machines, generating terabytes of logs and processing exabytes of data per day. At our scale, we observe cloud hardware, network, and operating system faults, and our software must gracefully shield our customers from any of the above. As a software engineer in the Runtime Observability team, you will develop observability solutions that provide insights into the health and performance of our products and infrastructure.

Requirements

  • BS (or higher degree) in Computer Science, or a related field
  • 4+ years of production level experience in one of: Java, Scala, C++, or similar language.
  • Experience in software development, in large-scale distributed systems
  • Familiarity with metrics collection, health monitoring, and observability tools
  • Experience building relationships with developers and field engineers to facilitate assessment and mitigation of performance and reliability problems.

Responsibilities

  • You will collaborate with different teams to identify metrics that allow engineers to observe how well the system and different subcomponents are performing.
  • You will build infrastructure to allow components to emit, log, and aggregate metrics that can be displayed on dashboards and generate insights for debugging and performance analysis.
  • You will scale the observability solutions to support millions of instances and billions of queries per day.
  • You will develop processes and training for developers and field engineers to debug performance and reliability issues affecting customers.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Professional, Scientific, and Technical Services

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service