Senior AI Engineer, Agentic Systems & Runtime Architecture

Voya FinancialNew York, NY
Hybrid

About The Position

We’re looking for a hands-on Senior AI Engineer to lead the design, build, and operation of production agentic AI systems—including multi-agent research assistants that deliver cited, grounded answers via both conversational experiences and programmatic APIs. You’ll own “runtime architecture” decisions (orchestration/routing, retrieval strategy, model serving patterns, and runtime controls) and help evolve our capabilities toward more sophisticated agentic design: planner/supervisor orchestration, advanced retrieval + reranking, evaluation gates (AgentOps), agentic security, and end-to-end observability.

Requirements

  • Proven experience designing and building LLM-powered applications in production, including prompt/tool orchestration and grounded response patterns.
  • Hands-on experience implementing multi-agent orchestration (planner/supervisor patterns, tool chaining, state management, and conditional routing.
  • Strong understanding of advanced retrieval for RAG: hybrid retrieval, rank fusion concepts, and reranking, with bonus points for contextual retrieval/contextual embeddings approaches.
  • Demonstrated ability to build evaluation systems for non-deterministic AI/agent behavior (rubrics/metrics, regression suites, and release gates), replacing “vibe checks” with systematic improvement loops.
  • Experience with AgentOps / LLMOps practices, including staged rollout models and continuous monitoring for quality, drift, safety, and cost-per-task.
  • Strong security mindset for LLM applications, including awareness of prompt injection (direct and indirect) and defense-in-depth patterns (input sanitization, structured prompts, output validation, least privilege, HITL where appropriate).
  • Proficiency in Python and modern AI engineering frameworks commonly used for agentic systems (e.g., graph-based orchestration patterns and RAG integration toolkits).
  • Experience designing and managing agent memory systems (working, long-term, episodic) and scalable prompt architectures — including version-controlled prompt libraries, hot-swap update patterns, and persona-specific prompt management across multi-agent systems.
  • Experience building production telemetry and diagnosing distributed, multi-hop workflows using tracing/metrics/logs (OpenTelemetry-style concepts are a plus).

Nice To Haves

  • familiarity with Databricks, Azure Foundry and other cloud AI platform patterns and operational requirements for model/agent lifecycle management (versioning, promotion, rollback, policy enforcement, telemetry).
  • experience in regulated or audit-minded environments where governance, traceability, and operational resilience matter.

Responsibilities

  • Collaborate with business and technical stakeholders to translate real-world research and workflow needs into AI-powered solutions that are measurable, reliable, and safe in production.
  • Architect and build multi-agent workflows (planner/supervisor + specialist agents) with explicit state management and routing, and interoperability via emerging agent protocols (MCP for tool integration, A2A for agent-to-agent delegation) designed for non-deterministic behavior and real operational constraints.
  • Design and continuously improve retrieval architectures for research assistants (hybrid retrieval + reranking), including advanced strategies such as contextual retrieval / contextual embeddings to reduce retrieval failures and improve grounding coverage.
  • Establish and operationalize AgentOps-style evaluation gates: treat the agent as a versioned artifact (model + prompt + tools + guardrails + eval thresholds), run statistical evaluation suites, and use staged rollout approaches to manage risk while maintaining iteration speed.
  • Implement agentic security controls for systems that ingest external content and use tools/APIs, including defenses against prompt injection and unsafe/over-broad tool execution.
  • Build production-grade observability across multi-step agent executions (traces/metrics/logs), define SLIs/SLOs for reliability and performance, and use telemetry to debug and improve probabilistic runtime behavior.
  • Own reliability outcomes: performance and cost tradeoffs (latency/throughput/cost), failure isolation, and incident response for AI-driven components.
  • Partner effectively with platform, security, and governance functions—ensuring enterprise standards are met while runtime architecture accountability stays with the team operating the production AI behavior.
  • Rapid learner with a hands-on mindset — able to quickly ramp up on emerging AI frameworks and tooling, prototype rigorously, and translate new developments into production-ready implementations with engineering discipline.

Benefits

  • Health, dental, vision and life insurance plans
  • 401(k) Savings plan – with generous company matching contributions (up to 6%)
  • Voya Retirement Plan – employer paid cash balance retirement plan (4%)
  • Tuition reimbursement up to $5,250/year
  • Paid time off – including 20 days paid time off, nine paid company holidays and a flexible Diversity Celebration Day.
  • Paid volunteer time — 40 hours per calendar year
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service