Infrastructure and Platform Development Engineer

Tenstorrent
10d$100,000 - $500,000Hybrid

About The Position

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities. Tenstorrent’s AI Software Infrastructure team builds the platforms that power internal development, orchestrate workloads, and manage large-scale AI hardware across on-prem data centers. This team develops and productionizes infrastructure used both internally and externally on Tenstorrent systems. This role is hybrid, based out of Toronto, ON; Santa Clara, CA; Austin, TX; Belgrade, Serbia; or Warsaw, Poland. We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.

Requirements

  • Strong backend or infrastructure engineer with experience building and operating platforms on bare-metal or on-prem systems.
  • Deep experience with Kubernetes, including cluster provisioning, operations, and debugging production issues.
  • Proficient in Python or Go with experience building APIs and services.
  • Comfortable with Linux systems, networking fundamentals, and debugging distributed systems.
  • Collaborative and adaptable, able to work across teams in fast-moving environments.

Responsibilities

  • Design, build, and maintain infrastructure platforms for development workflows, workload orchestration, and ML services.
  • Develop APIs and services that enable internal teams to interact with infrastructure systems.
  • Productionize and scale Kubernetes-based platforms, including cluster management and operational maturity.
  • Integrate automation workflows using CI/CD pipelines and infrastructure-as-code tools such as Ansible and GitOps frameworks.
  • Collaborate with SRE, infrastructure, and deployment teams to support large-scale on-prem and customer-facing environments.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service