Staff Engineer — Agentic AI

Clera•San Francisco, CA

5d•$160,000 - $250,000•Onsite

About The Position

A well-funded, early-stage B2B SaaS company building AI agent infrastructure for mechanical engineering workflows is hiring a Staff Engineer — Agentic AI to own the core agent intelligence layer. This is a high-impact, senior technical leadership role reporting directly to the CTO. You'll sit at the intersection of applied agentic AI, user research, and product delivery — determining real-world value for Fortune 100 enterprise customers in the CAD, CAE, and PLM space. You'll lead a small team of AI engineers, a user researcher, and domain expert contractors, acting as a player-coach who writes production code and sets technical direction.

Requirements

7+ years in software engineering, including at least 2 years building agentic LLM-based systems that act in the real world (multi-step workflows, tool-calling, failure handling, cost constraints).
Deep experience with LLM application architecture: model selection, context/window management, retrieval strategies, tool-calling frameworks, and orchestration patterns.
Strong evaluation and benchmarking instincts for agentic systems — task completion, cost efficiency, and failure mode analysis; familiarity with SWE-bench, GAIA, or τ-bench.
Proven track record of shipping AI systems with measurable outcomes, not just demos.
Proficiency in Python and the LLM tooling ecosystem (function calling, tool use APIs, tracing/observability tools such as Logfire or LangSmith, evaluation frameworks).
Experience leading a small technical team (3–6 engineers): setting direction, performing code reviews, and driving architecture decisions.

Nice To Haves

Experience with desktop automation, COM, or programmatic control of applications (beyond web APIs).
Background in mechanical engineering, CAD/CAE, PLM, or adjacent industries.
Familiarity with enterprise deployment constraints on locked-down corporate workstations.
Published work or open-source contributions in agentic AI systems.
Experience building or contributing to public benchmarks for AI agents.

Responsibilities

Lead development of the core agent intelligence layer that executes multi-step workflows across complex desktop engineering software.
Own the full product loop: define agent capabilities from user stories, build implementations, and benchmark against real workflows.
Drive agent task success rate — define the evaluation framework, establish baselines, and systematically improve completion metrics.
Set and enforce per-task token budgets; track cost per completed workflow to ensure commercial viability.
Build rigorous, reproducible evaluation infrastructure grounded in validated user stories (SWE-bench-level rigor applied to engineering workflows).
Lead user story mapping and validation through interviews and close collaboration with domain experts.
Translate validated user stories into testable evals and close the loop between research and benchmarking.
Own agent architecture decisions: tool-calling strategies, state management, error recovery, model routing, and context management.
Set technical direction, review architecture decisions, unblock the team, and raise the engineering bar across a team of 3–6 engineers.
Collaborate cross-functionally with integrations, product, and customers during POCs to align agent behavior with real-world usage.