ML Infrastructure Engineer

Sygaldry TechnologiesSan Francisco, CA

About The Position

Sygaldry Technologies is building quantum-accelerated AI servers to exponentially speed up training and inference for AI. By integrating quantum and AI, we're accelerating the path to superintelligence, and addressing the problem of rising compute costs and energy bottlenecks. Sygaldry AI servers combine multiple qubit types within a single, fault-tolerant architecture to deliver the combination of cost, scale, and speed necessary for advanced AI applications. We pioneer new domains in physics, engineering, and AI, tackling the hardest challenges with a grounded, optimistic, and rigorous culture. We're looking for individuals ready to define the intersection of quantum and AI and drive its profound global impact. About the Role Our AI & Algorithms team is growing fast - research scientists, applied mathematicians, and quantum algorithm researchers developing the algorithms that will accelerate and transform AI. They need compute infrastructure that stays out of their way: GPU access that's reliable, experiments that are reproducible, and workloads that scale without requiring each researcher to become a cloud expert. You'll build and manage the compute platform this team runs on. The workloads are diverse -- quantum circuit simulation, large-scale numerical optimization, model training, tensor network contractions, and high-throughput data generation -- across multiple cloud providers and on-prem GPU servers. You own the full stack from cloud provider configuration to the Python APIs that researchers use to launch jobs.

Requirements

  • Think in systems: you see how compute, storage, networking, and cost interact
  • Care about developer experience: you've felt the pain of bad research infrastructure
  • Are pragmatic about tooling: right tool for the job, no over-engineering
  • Take ownership: you want to own a critical function with autonomy
  • Write things down: you document decisions and create runbooks

Nice To Haves

  • Deep AWS experience (EC2, S3, IAM, CloudFormation or Terraform)
  • GPU compute management (instance types, spot strategies, multi-GPU, distributed training)
  • Python-based ML and scientific computing tooling (PyTorch, JAX)
  • GCP and/or Modal experience
  • MLops or research computing platforms (MLflow, W&B, Kubeflow, or HPC job schedulers)
  • CI/CD pipeline management (GitHub Actions, containers)
  • Hybrid cloud / on-prem GPU cluster management
  • Experience supporting research teams with heterogeneous computing needs

Responsibilities

  • Build compute abstractions that handle the team's diverse workloads: GPU-accelerated simulation, distributed training, high-throughput CPU jobs, and interactive analysis -- across PyTorch, JAX, and scientific computing frameworks
  • Stand up experiment tracking and reproducibility infrastructure
  • Create developer tooling that makes cloud compute feel local: environment setup, job submission, monitoring, and artifact management
  • Scale experiments from single-GPU prototyping to multi-node production runs
  • Design multi-provider workload orchestration: route jobs based on cost, availability, and capability
  • Manage and optimize spend across cloud providers -- track credit balances, burn rates, and expiration dates
  • Configure hybrid local + cloud workflows as on-prem GPU infrastructure comes online
  • Coordinate with our infrastructure engineer on cloud administration and security
  • Build CI/CD pipelines for research workloads: automated testing, evaluation benchmarks, artifact management
  • Create data generation and preprocessing pipelines at the throughput the team's simulators demand
  • Set up monitoring, alerting, and cost dashboards that surface problems before researchers hit them

Benefits

  • Visa Sponsorship - We know what it takes to make top talent thrive here. We’re open to supporting visas whenever possible.
  • Compensation - We value your contribution and invest in your future with a competitive salary and meaningful equity.
  • Benefits - Your well-being matters. We provide company-sponsored health coverage to give you and your family peace of mind.
  • Connection - Whether it’s company offsite or casual crew socials, we make time to connect, recharge, and have fun together.
  • Time Off - We trust you to take the time you need. Unlimited PTO so you can rest, recharge, and come back ready to make an impact.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service