Staff AI Platform Engineer

Invoca

10d•Remote

About The Position

Invoca is seeking a Staff AI Platform Engineer to join their dynamic, fast-growing team. This role is focused on building the core platform that powers AI development at Invoca, rather than building isolated AI features. The engineer will design and build reusable systems, abstractions, interfaces, and operating standards to enable product teams to develop, ship, and operate AI capabilities quickly and safely. The position is at the intersection of developer experience, backend systems, AI infrastructure, and platform architecture, aiming to turn common AI development problems into durable platform capabilities. The ideal candidate will have a platform-as-a-product mindset, focusing on technical correctness, adoption, ergonomics, standardization, and long-term leverage, creating opinionated golden paths for faster development without sacrificing reliability, observability, or governance. The role is remote and open to candidates in the United States and Canada.

Requirements

7+ years of professional experience in Platform engineering in AI/ML disciplines, with a strong focus on building shared infrastructure, platform products, developer tooling, or foundational backend systems used by a large number of users.
Experience on the infrastructure and framework layers behind AI systems such as agent orchestration, model-serving abstractions, evaluation systems, retrieval infrastructure, prompt/context tooling, MCP/A2A stack, platform APIs for AI development, etc.
Developer-facing product mindset: Care deeply about developer experience. Design systems, interfaces, and abstractions that engineers actually want to use.
Strong distributed systems and backend engineering skills: Highly effective in Python and/or TypeScript and comfortable designing resilient services, APIs, asynchronous systems, and distributed workflows.
API and contract design ability: Designed robust, typed, well-structured interfaces such as REST, gRPC, SDKs, or platform contracts that support long-term extensibility and adoption.
AI Systems Thinking Ability: Understand how frameworks like LangChain, LangGraph or LlamaIndex work under the hood and know when to abstract them away. Built frameworks using them to solve various AI concerns for developers.
Experience with model infrastructure and operational concerns: Understand latency, throughput, reliability, token/cost management, model-provider integration, and the realities of running AI systems in production.
Experience with observability and evaluation: Built or contributed to tracing, quality measurement, automated evaluation, or other systems that help teams understand and improve AI system performance.
Platform standardization instincts: Know how to take repeated engineering problems and turn them into reusable golden paths, shared components, and organizational standards.
Familiarity with Kubernetes, Docker, and Terraform.
Intermediate to Expert level proficiency with TypeScript/JavaScript, React, and other web technologies.
Bachelor's Degree in Computer Science, Engineering, or a related field (or equivalent practical experience).

Nice To Haves

Experience with agent frameworks and orchestration runtimes such as LangGraph, LangChain, LiveKit or similar systems
Experience with model gateways, inference infrastructure, or provider-routing layers
Experience with retrieval systems, vector databases, and context-management infrastructure
Familiarity with MCP-style interfaces, tool interoperability patterns, or workflow schemas
Experience with multi-tenant platform design, governance, or enterprise-grade controls
Experience creating reference implementations, templates, or internal developer platforms
An advanced degree (Master's or Ph.D.)

Responsibilities

Build the AI platform foundations: Design and maintain the core APIs, SDKs, libraries, and services that standardize how teams at Invoca build AI and agentic capabilities.
Create reusable agentic platform primitives: Build the platform components that product teams can assemble into real applications, such as orchestration building blocks, tool interfaces, workflow contracts, context-handling systems, and retrieval primitives.
Own platform-level evaluation infrastructure: Build the shared evaluation framework that enables teams to test AI systems consistently and at scale. This includes CI/CD integration, dataset and experiment workflows, scoring pipelines, and support for offline and online evaluation.
Standardize orchestration and interoperability: Help define the contracts, schemas, and interfaces that allow tools, runtimes, and services to work together cleanly. This includes platform patterns for MCP-style interoperability, tool discovery, and agent-to-agent or agent-to-system interactions.
Build for production scale and reliability: Design and improve the serving, routing, and control layers that sit between applications and model providers. Ensure low latency, high availability, cost efficiency, and strong production behavior.
Drive observability and governance: Build and enforce the platform capabilities that make AI systems measurable, debuggable, and governable in production, including tracing, auditing, policy enforcement, and operational standards.
Improve developer experience and platform adoption: Treat engineers and product teams as customers. Create tools, templates, documentation, and reference implementations that make the platform easy to adopt and hard to misuse.
Shape technical direction: As a staff engineer, influence architecture, standards, and long-term technical strategy for AI Platform. Identify repeated patterns across teams and convert them into reusable platform solutions.
Architect the AI platform foundations: Design and evolve the core libraries, SDKs, APIs, and shared services that standardize how teams build agentic and GenAI-powered products at Invoca.
Build reusable agentic and retrieval primitives: Create modular platform components for orchestration, retrieval, tool execution, workflow composition, adoption of latest protocols and standards, and A2A/MCP-compatible interoperability that product teams can use to build production-grade AI systems.
Own the evaluation platform: Build the shared evaluation infrastructure that enables teams to measure and improve AI quality through datasets, scoring pipelines, experiment workflows, and CI/CD-integrated offline and online evaluations.
Standardize context engineering: Develop the systems and tooling for prompt management, context versioning, templating, testing, and governance so teams can iterate safely and consistently on AI applications.
Engineer for scale, reliability, and governance: Build/Establish the serving, routing, and control layers between applications and model providers, ensuring strong production performance, cost efficiency, policy enforcement, and operational resilience.
Drive technical strategy and platform adoption: Partner with product and engineering leaders to turn recurring architectural needs into shared standards, reference implementations, and developer-friendly golden paths that accelerate delivery across the organization.

Benefits

Flexible Time Off
16 U.S. paid holidays, including a winter break
Medical, dental, and vision coverage
Fertility assistance
401(k) plan through Fidelity with a company match of up to 4%
Stock options
Mental Health Program through SpringHealth
Up to 6 weeks of 100% paid leave for baby bonding, adoption, and caring for family members
Up to 12 weeks of 100% paid leave for childbirth and medical needs
InVacation bonus after 7 years of service
Wellness Subsidy for gym memberships, fitness classes, and more