About The Position

Cerebras Systems is a pioneer in large-scale AI Supercomputers. These multi-exaflop supercomputers are deployed in some of the biggest datacenters. These supercomputers are built using our Wafer-Scale Cluster technology - a cluster of several Wafer Scale Engine (WSE) chips. The Cluster engineering team is responsible for delivering software that are all-things related to cluster.

Requirements

  • Strong track record of software architecture, system design and development.
  • Strong track record of development in distributed cluster.
  • Strong understanding of Kubernetes (K8s) software ecosystem, Prometheus and Grafana.
  • Strong development skills in GoLang, Python, bash.
  • Strong debugging skills with distributed systems.
  • Strong skill to develop tests for the new features and regress old features.

Responsibilities

  • Automate bare-metal configuration of networking, OS, and application software in large clusters of Cerebras WSE, servers, and switches.
  • Additional push button workflows for cluster upgrades, downgrades, and security patching with key metrics to minimize downtime on clusters.
  • An orchestration and scheduler system for resource allocation, job submission C placements for a multi-user environment on a cluster.
  • Seamless support for both on-premise and cloud mode deployment and operations.
  • A robust system for monitoring, detecting and handling failures for a variety of resources on the clusters (including High Availability of clusters).
  • Broad cluster and job monitoring and visualization capabilities, along with alerting systems.
  • User facing tools to monitor the status of jobs and collect metrics.
  • Administrator facing tools to manage and operate large clusters.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service