Developer Experience Engineer

EtchedSan Jose, CA
6dOnsite

About The Position

We are looking for a Developer Experience Engineer to enhance developer productivity, automation, and infrastructure across our hardware and software teams. You will work at the intersection of DevOps, software engineering, and high-performance computing (HPC), building systems that accelerate chip design, simulation, and AI model deployment in a cloud and on-prem hybrid environment.

Requirements

  • Strong Python skills for automation, scripting, and infrastructure development.
  • Experience with Slurm job scheduling in an HPC or hybrid environment.
  • Hands-on experience with observability and monitoring tools like Prometheus, Grafana, and OpenTelemetry.
  • Expertise with Docker and Kubernetes, including Helm charts and cluster management.
  • Proficiency in modern CI/CD pipeline management with tools like GitHub Actions, Jenkins, or Buildkite.
  • Experience with infrastructure-as-code tools like Terraform or Ansible.
  • Knowledge of cloud infrastructure, compute, and storage optimization on AWS or GCP.

Nice To Haves

  • Data pipelining for AI/ML workflows using Airflow, Prefect, or Dagster.
  • Build system expertise with Bazel, CMake, or distributed build systems.
  • Secrets management tools such as Vault, SOPS, AWS Secrets Manager, or GCP Secret Manager.
  • AI/ML model training workflows and monitoring GPU-accelerated workloads.
  • Exposure to FPGA or ASIC development environments and workflows.

Responsibilities

  • Develop and maintain automation tools to streamline development, testing, and deployment workflows.
  • Optimize and manage Slurm-based job scheduling for AI workloads, simulation, and chip design workflows.
  • Build observability solutions using Grafana, Prometheus, and OpenTelemetry for monitoring pipelines, infrastructure, and compute clusters.
  • Manage and optimize containerized environments using Docker and Kubernetes to enhance scalability and reproducibility.
  • Enhance build, test, and deployment pipelines with CI/CD tools like GitHub Actions, Jenkins, Buildkite, or Bazel.
  • Develop caching and artifact management systems to reduce build times and improve dependency resolution.
  • Integrate and manage cloud resources (AWS, GCP) for scaling compute, storage, and hybrid workloads.
  • Support security and compliance efforts including secrets management and access control.
  • Document and share best practices for efficient developer tooling and workflows.

Benefits

  • Full medical, dental, and vision packages, with generous premium coverage
  • Housing subsidy of $2,000/month for those living within walking distance of the office
  • Daily lunch and dinner in our office
  • Relocation support for those moving to West San Jose
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service