Senior AI Platform Engineer

Firestorm
1d$165,000 - $225,000Remote

About The Position

At Firestorm, we are building autonomous aerial systems that operate where they are needed most, when they are needed most. Our mission requires speed, ingenuity, and a relentless commitment to engineering excellence. We move fast, test constantly, and deliver capability that performs in the real world, not just in simulation. We are looking for a Senior AI Platform Engineer who is excited to build the platform foundations that make AI-enabled software reliable, secure, and operable at scale. You will design and implement core services, registries, workflow orchestration primitives, and the integration patterns that connect AI-driven workflows to internal systems, without compromising governance, auditability, or safety. This is a hands-on role with significant ownership: you will build the primitives that other engineers depend on, and you will help set the standard for reliability and software quality in a fast-moving environment. If you want to build systems that matter, own your work end to end, and be part of a team that values bold thinking grounded in rigorous engineering, Firestorm is the place to do it.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience)
  • U.S. Citizenship and the ability to obtain and maintain a U.S. Government security clearance
  • 6+ years of experience building and operating backend/platform systems in production
  • 3+ years building platforms that support AI/ML systems in production (e.g., evaluation pipelines, model/app runtime infrastructure, artifact/metadata registries, AI workflow orchestration, MLOps)
  • Experience operating LLM-enabled systems with production constraints (latency, cost, reliability), including monitoring quality/regressions and enforcing safe tool/data access
  • Strong proficiency in one or more backend languages (e.g., Go, Java, Python, C++) and modern infrastructure practices
  • Experience designing APIs, data models, and distributed systems with reliability and security best practices
  • Experience with workflow/event systems (queues, pub/sub, orchestration, idempotency, state machines) used to run multi-step AI-driven pipelines
  • Experience implementing observability for AI systems (metrics, tracing, logs) including quality/reliability signals beyond uptime (e.g., eval scores, rejection rates, cost/latency budgets)
  • Experience with production security controls: RBAC/ABAC, audit logs, secrets management, data access boundaries
  • Strong communication and documentation skills

Nice To Haves

  • Experience with AI/ML infrastructure (model serving, inference gateways, feature/data pipelines, experiment tracking, artifact registries)
  • Experience with GPU-aware infrastructure and/or high-throughput inference (capacity planning, batching, caching, rate limiting)
  • Experience building evaluation platforms (offline/online evals, canaries, A/B testing, regression automation, dataset/version management)
  • Familiarity with AI safety/security patterns (prompt injection mitigation, tool sandboxing, policy enforcement, data-loss prevention)
  • Experience building internal platforms used by multiple teams with clear contracts, SLAs/SLOs, and well-managed migrations
  • Experience with multi-tenant or role-based access control systems

Responsibilities

  • Build and operate core backend services that power AI-enabled workflows (APIs, orchestration, storage, and internal integrations)
  • Design scalable data models and registries for versioned artifacts and metadata, with strong traceability and auditability
  • Implement secure-by-default service patterns: authN/authZ, audit logs, secrets handling, and least-privilege access
  • Build reliability foundations: observability, metrics, tracing, alerting, SLOs, incident response playbooks
  • Implement idempotent APIs and state-handling patterns for resilient workflows (retries, partial failure, reconciliation)
  • Create integration adapters and event-driven plumbing to safely connect workflows to internal systems
  • Establish release and deployment practices: CI/CD pipelines, environment promotion, rollback strategies, and safe migrations
  • Partner closely with AI and application engineers to define interfaces, validation layers, and operational constraints
  • Identify performance, scalability, and security risks early and ship pragmatic solutions quickly

Benefits

  • We offer comprehensive medical, dental, and visions plans
  • 401(k) Retirement Savings Plan to invest in your long-term retirement goals
  • Equity grants for new hires
  • Unlimited PTO
  • Extremely generous company holiday calendar, including a holiday hiatus in November, & December
  • Generous Parental Leave
  • Lifestyle Spending Account
  • FSA
  • DCFSA
  • HSA
  • Hospital Indemnity insurance
  • Critical Illness insurance
  • Accident insurance
  • Basic Life/AD&D, short-term and long-term disability insurance, 100% covered by Firestorm. Plus, the option to purchase additional life insurance for you and your family.
  • Mental Health Resources: We provide free mental health resources 24/7 including therapy and more. Additional work-life services, such as free legal and financial support, are available to you as well
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service