Senior Agentic (AI) Engineer

Worth AI•Orlando, FL

56d•Remote

About The Position

Worth AI is hiring a Senior Agentic AI Engineer to design and ship production agent systems that automate KYB, underwriting, and risk decisions on regulated financial data. You’ll own agents end-to-end architecture, retrieval, tools, evals, and production deployment and partner closely with our Chief AI Officer, applied scientists, and platform teams.

Requirements

5+ years of software engineering experience, with 2+ years building production LLM or agentic systems (not just notebooks or demos).
Hands-on experience with a modern agent framework (LangGraph strongly preferred) and a track record of shipping agents that run, fail gracefully, and recover.
Strong RAG fundamentals chunking, embeddings, hybrid retrieval, reranking, grounding — and judgment about when RAG isn’t the right answer.
Real eval experience golden sets, offline and online evaluations, used to make ship/no-ship calls.
Production MLOps fluency: deployed LLM workloads under real latency, cost, and reliability constraints.
Strong Python; comfortable in TypeScript / Node.js.
Solid systems engineering instincts APIs, async patterns, queues, databases, distributed system failure modes.
Calibrated communicator; thrives in ambiguous, fast-moving environments.

Nice To Haves

Prior experience in fintech, lending, payments, KYB/KYC, fraud, or AML.
Experience building MCP servers or other structured tool interfaces for LLMs.
Background in classical ML (ranking, scoring, calibration).
Experience designing explainable / auditable AI workflows for regulated environments.
Open-source contributions to agent frameworks, eval tooling, or retrieval libraries.
AWS depth (EKS, MSK, RDS, S3, Lambda) and IaC with Terraform.

Responsibilities

Design and ship multi-step agentic systems (planner/executor, tool-using, multi-agent, human-in-the-loop) for onboarding, underwriting, case review, and continuous monitoring.
Architect agent graphs in LangGraph (or comparable — CrewAI, AutoGen, Claude Agent SDK) with explicit state, durable execution, retries, and safe fallbacks.
Build the retrieval layer powering our agents — chunking, hybrid search, reranking, and grounded citation.
Own the eval stack: golden sets, offline regression suites, LLM-as-judge, online A/B and shadow evals, and red-teaming for jailbreaks, prompt injection, and PII leakage.
Expose agents to production systems via well-typed tools and MCP servers. Treat tool surface area as a product.
Drive production MLOps: deployment, versioning, traffic shaping, cost/latency budgets, tracing, and on-call playbooks for agent incidents.
Partner with security and compliance to keep agents inside SOC 2, GDPR, CCPA, and fair-lending posture — auditability and explainability built in, not bolted on.
Mentor engineers on agent patterns, prompt hygiene, eval discipline, and LLM failure modes.