Sr. Member of Technical Staff

Cerebras SystemsSunnyvale, CA
Remote

About The Position

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Requirements

  • Master’s degree or foreign equivalent degree in Computer Science, or a related field and 18 months of experience as Information Security Analyst, Software Engineer, Sr. Member of Technical Staff, IT Senior Applications Engineer, or a related occupation required.
  • Infrastructure-as-Code and deployment automation:Terraform, AWS CloudFormation, AWS CDK, and Ansible;
  • Containerization and orchestration:Docker, Kubernetes, AWS EKS, AWS Elastic Container Service (ECS), AWS Fargate, and Helm;
  • Compute and serverless services: AWS EC2, AWS Lambda functions, and Auto Scaling Groups;
  • Monitoring, logging, and distributed tracing: AWS CloudWatch, AWS X-Ray, ELK (Elasticsearch, Logstash, Kibana), Prometheus, and Grafana;
  • Programming languages and frameworks: Python, Node.js, JavaScript, and Flask;
  • Data storage and caching: PostgreSQL, Redis, and NFS;
  • CI/CD and version control: Jenkins and Git

Responsibilities

  • Design and develop software features that support system resiliency and high availability, including automated recovery mechanisms and fault-tolerant architecture across distributed environments.
  • Develop and maintain cloud-based deployment workflows for AI inference software using AWS tools and services to support low-latency and scalable system performance.
  • Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks.
  • Use parallel programming techniques (e.g., multi-threading, asynchronous processing) to maximize resource efficiency on AWS compute instances.
  • Develop software components to support visualization and analysis of system performance metrics, enhancing the monitoring and usability of inference services.
  • Develop inference software in Docker containers and define Kubernetes orchestration strategies that ensure software reliability and efficient scaling.
  • Develop automated scripts to detect and mitigate common failure modes, improving software system reliability.
  • Debug issues related to model deployment, container orchestration, networking configurations, documenting steps to reproduce and root-cause defects.
  • Triage and resolve defects in the software service by analyzing logs, metrics, and distributed traces using tools like AWS CloudWatch, Grafana, or custom Python scripts.
  • Work with product management and user experience teams to define requirements for inference service interfaces, including configuration, monitoring, and event logging.
  • Author detailed technical documentation for infrastructure configurations, inference workflows, and APIs, ensuring clarity for internal teams and external customers.
  • Document and track defects, enhancements, and release notes using tools like Jira and Git, ensuring version control and traceability.

Benefits

  • Job stability with startup vitality
  • Simple, non-corporate work culture that respects individual beliefs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service