Staff AI Engineer

RBC•Toronto, ON

1d•Onsite

About The Position

We're building the engine that judges how good our agents actually are. Claims have to be data-driven: you can't build on what you can't see, so how can you honestly say one version is 10% better than the last? Evaluation runs both before we ship and after; this role owns the runtime side — judging agents live in production, from the traces they generate serving real traffic. The hard part is the data. Agent behaviour generates verbose traces with high cardinality, and we need a system that can analyze them real-time, providing actionable insights in low latency. Join us to build it: the engineering looks a lot like site reliability engineering meeting user analytics, combining high-throughput low latency data with evaluating user behaviour and outcomes.

Requirements

8+ years in software or platform engineering, with 5+ in SRE, real-time data infrastructure, observability, or large-scale stream processing.
A track record running high-volume telemetry in production with hands-on work on ingestion, storage, and query at scale.
Distributed tracing and Open Telemetry: semantic conventions, collector configuration, span correlation across services.
Familiarity with routing traffic on live signal, whether that's weighted load balancing, canary rollouts, or multi-armed-bandit routing.
Turning telemetry into decisions in real time — scoring, anomaly detection, or rule/threshold evaluation on streaming data.
An LLM observability platform (`Langfuse`, `MLFlow`, or equivalent) and the trace-to-evaluation feedback loop.

Nice To Haves

A feel for the latency and backpressure trade-offs of doing work in the live request path — collectors, proxies, sidecars.
Experience in a regulated industry (financial services, healthcare) and its constraints on AI infrastructure.
AI security controls in the request path: prompt-injection mitigation, output filtering, PII detection.
AI governance, model audit logging, and runtime drift detection.
Open-source contributions or published work in observability, tracing, or LLM evaluation.

Responsibilities

Build the ingestion path that takes agent traces at production volume and keeps up with it.
Score agent behaviour live — judge quality straight from the trace as it happens, not in a batch job hours later.
Enforce quality and safety guardrails in the request path stopping it before it reaches the user, within a fixed latency budget and at predictable cost.
Correlate spans across services so one request reads as one trace.
Own the experience of turning production traces back into datasets and test cases the next version is measured against.
Set the technical direction for this burgeoning field, and push it into the open through open source contributions and conference talks.

Benefits

bonuses
flexible benefits
competitive compensation
commissions
stock where applicable
Leaders who support your development through coaching and managing opportunities
Ability to make a difference and lasting impact
Work in a dynamic, collaborative, progressive, and high-performing team
A world-class training program in financial services
Opportunities to do challenging work