About The Position

UniversalAGI is hiring an Infrastructure Engineer to build and own the execution platform powering our research and customer deployments: data generation + simulation orchestration + training/fine-tuning infrastructure + benchmarking pipelines + production deployments in customer environments. You’ll work closely with the CEO and founding team to turn research into repeatable, scalable, reliable systems - internally and in customer infrastructure. This is a “ship outcomes” role: your work directly determines how fast we can iterate, how reproducible our results are, and how reliably we deliver in production.

Requirements

  • Strong software engineering skills (clean code, debugging, reliability, reproducibility).
  • Hands-on experience building/operating infrastructure for ML/compute-heavy workflows: pipelines, job orchestration, GPU compute, storage, CI/CD, monitoring.
  • Olympic athlete mindset: You have high standards for yourself and are obsessed with measurable improvement on the metrics you are delivering to customers.
  • Resourcefulness: you know when to do the “quick & correct” fix vs. when to invest in a robust solution, and you can justify the tradeoff with impact/
  • Ownership: Comfortable owning work end-to-end and being accountable for measurable outcomes.

Nice To Haves

  • Experience with workflow orchestration (e.g., Ray, Kubernetes, Slurm).
  • Experience with GPU infrastructure and distributed training systems.
  • Experience building evaluation/benchmarking frameworks with strong reproducibility guarantees.
  • Experience deploying into regulated / security-sensitive environments (gov/defense/enterprise).
  • Experience with simulation/HPC pipelines (CFD, meshing, batch workloads) is a plus but not required.
  • Experience in an FDE-style / delivery execution role (or similar “ship results fast” environments).

Responsibilities

  • Build the foundation platform (internal)
  • Build and operate scalable infrastructure for data generation and simulation workflows (job orchestration, scheduling, queues, retries, observability).
  • Build reproducible pipelines for training/fine-tuning and benchmarking (artifact/version management, experiment tracking, dataset lineage).
  • Own cost/performance tradeoffs across compute, storage, networking, and runtime efficiency.
  • Deploy to customers (external)
  • Lead deployments of our stack into customer cloud/on-prem environments, including secure networking, permissions, and data movement.
  • Build robust deployment patterns: environment provisioning, CI/CD, rollbacks, monitoring, and incident response.
  • Partner with customers to ensure reliability and repeatability under real-world constraints (security, compliance, infra limits, data governance).

Benefits

  • Competitive compensation and equity.
  • Competitive health, dental, vision benefits paid by the company.
  • 401(k) plan offering.
  • Flexible vacation.
  • Team Building & Fun Activities.
  • Great scope, ownership and impact.
  • AI tools stipend.
  • Monthly commute stipend.
  • Monthly wellness / fitness stipend.
  • Daily office lunch & dinner covered by the company.
  • Immigration support.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service