Senior AI Engineer, Agentic Systems & Runtime Architecture

Voya Financial•New York, NY

49d•Hybrid

About The Position

We’re looking for a hands-on Senior AI Engineer to lead the design, build, and operation of production agentic AI systems—including multi-agent research assistants that deliver cited, grounded answers via both conversational experiences and programmatic APIs. You’ll own “runtime architecture” decisions (orchestration/routing, retrieval strategy, model serving patterns, and runtime controls) and help evolve our capabilities toward more sophisticated agentic design: planner/supervisor orchestration, advanced retrieval + reranking, evaluation gates (AgentOps), agentic security, and end-to-end observability.

Requirements

Proven experience designing and building LLM-powered applications in production, including prompt/tool orchestration and grounded response patterns.
Hands-on experience implementing multi-agent orchestration (planner/supervisor patterns, tool chaining, state management, and conditional routing.
Strong understanding of advanced retrieval for RAG: hybrid retrieval, rank fusion concepts, and reranking, with bonus points for contextual retrieval/contextual embeddings approaches.
Demonstrated ability to build evaluation systems for non-deterministic AI/agent behavior (rubrics/metrics, regression suites, and release gates), replacing “vibe checks” with systematic improvement loops.
Experience with AgentOps / LLMOps practices, including staged rollout models and continuous monitoring for quality, drift, safety, and cost-per-task.
Strong security mindset for LLM applications, including awareness of prompt injection (direct and indirect) and defense-in-depth patterns (input sanitization, structured prompts, output validation, least privilege, HITL where appropriate).
Proficiency in Python and modern AI engineering frameworks commonly used for agentic systems (e.g., graph-based orchestration patterns and RAG integration toolkits).
Experience designing and managing agent memory systems (working, long-term, episodic) and scalable prompt architectures — including version-controlled prompt libraries, hot-swap update patterns, and persona-specific prompt management across multi-agent systems.
Experience building production telemetry and diagnosing distributed, multi-hop workflows using tracing/metrics/logs (OpenTelemetry-style concepts are a plus).

Nice To Haves

familiarity with Databricks, Azure Foundry and other cloud AI platform patterns and operational requirements for model/agent lifecycle management (versioning, promotion, rollback, policy enforcement, telemetry).
experience in regulated or audit-minded environments where governance, traceability, and operational resilience matter.

Responsibilities

Collaborate with business and technical stakeholders to translate real-world research and workflow needs into AI-powered solutions that are measurable, reliable, and safe in production.
Architect and build multi-agent workflows (planner/supervisor + specialist agents) with explicit state management and routing, and interoperability via emerging agent protocols (MCP for tool integration, A2A for agent-to-agent delegation) designed for non-deterministic behavior and real operational constraints.
Design and continuously improve retrieval architectures for research assistants (hybrid retrieval + reranking), including advanced strategies such as contextual retrieval / contextual embeddings to reduce retrieval failures and improve grounding coverage.
Establish and operationalize AgentOps-style evaluation gates: treat the agent as a versioned artifact (model + prompt + tools + guardrails + eval thresholds), run statistical evaluation suites, and use staged rollout approaches to manage risk while maintaining iteration speed.
Implement agentic security controls for systems that ingest external content and use tools/APIs, including defenses against prompt injection and unsafe/over-broad tool execution.
Build production-grade observability across multi-step agent executions (traces/metrics/logs), define SLIs/SLOs for reliability and performance, and use telemetry to debug and improve probabilistic runtime behavior.
Own reliability outcomes: performance and cost tradeoffs (latency/throughput/cost), failure isolation, and incident response for AI-driven components.
Partner effectively with platform, security, and governance functions—ensuring enterprise standards are met while runtime architecture accountability stays with the team operating the production AI behavior.
Rapid learner with a hands-on mindset — able to quickly ramp up on emerging AI frameworks and tooling, prototype rigorously, and translate new developments into production-ready implementations with engineering discipline.

Benefits

Health, dental, vision and life insurance plans
401(k) Savings plan – with generous company matching contributions (up to 6%)
Voya Retirement Plan – employer paid cash balance retirement plan (4%)
Tuition reimbursement up to $5,250/year
Paid time off – including 20 days paid time off, nine paid company holidays and a flexible Diversity Celebration Day.
Paid volunteer time — 40 hours per calendar year

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume