Sr Software Engineer, Infrastructure

DatabricksSan Francisco, CA
Hybrid

About The Position

At Databricks Information Technology, we are a product-led organization transforming the way we work, from how easy it is to use our IT services to the applications we develop that help us scale seamlessly in the face of incredible growth. As a Senior Software Engineer (Infrastructure), you will be a core technical contributor on the IT Infrastructure team, owning and driving the evolution of our core infrastructure and observability platforms. This role requires a strong software engineering mindset, deep technical breadth across SRE and infrastructure worlds, and the ability to deliver high-quality, scalable solutions for currently "immature" system problems. You will be responsible for building resilient, scalable, and automated infrastructure that empowers our development teams. As a senior member of the team, you will bridge the gap between software engineering and systems architecture, ensuring our AWS environment is cost-optimized, secure, and highly available.

Requirements

  • 5+ years of production-level experience with a strong proficiency in Python (non-negotiable).
  • Expert-level proficiency in Terraform (modules, state management) or Pulumi(Preferred).
  • Hands-on experience with AWS (or Azure/GCP), Kubernetes, Docker and containerization concepts.
  • Experience building and troubleshooting integrations between infrastructure, data pipelines, and observability platforms.
  • Advanced knowledge of Github Actions, Github Runners.
  • Understanding of observability pillars (logging, metrics, tracing) and hands-on experience with tools like Datadog, Prometheus, or ELK.
  • Proficiency in running systems through concepts like Kafka or messaging queues.
  • Ability to operate with minimal guidance, take ownership of ambiguous projects, and follow a vision set by tech leads to execute independently.

Nice To Haves

  • Pulumi

Responsibilities

  • Design and deploy production-grade infrastructure on AWS using Terraform or Pulumi.
  • Manage and scale containerized workloads using AKS (Azure Kubernetes Service) or EKS, focusing on cluster security and resource efficiency.
  • Architect robust deployment pipelines using GitHub Actions, managing both GitHub-hosted and self-hosted runners for specialized build requirements.
  • Create underlying infrastructure to ensure new internal applications are secure and have logging and metrics enabled by default
  • Build internal CLI tools, AI plugins and automation scripts to streamline developer workflows and enhance operational efficiency
  • Collaborate with stakeholders across Security, Engineering, Infrastructure, and Support to deliver impactful projects with real business outcomes.
  • Participate in Code reviews, Document solutions and failure triage playbooks, and mentor junior engineers on the platforms you own.

Benefits

  • comprehensive benefits and perks that meet the needs of all of our employees
  • eligibility for annual performance bonus
  • equity

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service