Senior Applied AI Engineer

Function Health•Canada, KS

16h

About The Position

Function Health is seeking a Senior Applied AI Engineer to own the agentic system that transforms member health data into clear next steps, working alongside clinicians. This role involves building and operating production multi-agent systems end-to-end, including orchestration graphs, tool use and memory, retrieval over member records, and the necessary evaluations and observability for reliability. The engineer will integrate frontier LLMs into Function's proprietary data, ensuring adherence to healthcare standards for safety, latency, and cost. This is a hands-on, high-ownership position for an engineer experienced in building agents at scale, aiming to impact human health positively.

Requirements

1+ years building agentic AI systems
6+ years as a full-stack or ML engineer, building production backends or ML systems in Python, Go, or similar.
Fluency with agentic orchestration (e.g., LangGraph, PydanticAI, DSPy, LlamaIndex) and tool/function calling.
Experience integrating frontier LLMs and multimodal models via managed APIs or self-hosted serving.
Strong with API design and backend frameworks (FastAPI, Flask) and event-driven architectures.
Data systems expertise with PostgreSQL, including token streaming and throughput tuning.
Retrieval and memory: vector databases (pgvector, Pinecone, Weaviate, Milvus), hybrid search, and graph/knowledge storage.
Production evals: LLM-as-judge, human-in-the-loop, rubric design, and CI-integrated regression tests.
Observability and SRE: OpenTelemetry traces, metrics, structured logs, SLOs, dashboards, and on-call triage.
Cloud-native delivery: Kubernetes, Terraform, Docker, GPU scheduling/autoscaling on AWS or GCP.
CI/CD proficiency with GitHub Actions and test automation for prompts, tools, and agents.
Clear, concise communication and high ownership in fast-paced environments.

Nice To Haves

Real-time multimodal systems: streaming ASR, low-latency TTS, WebRTC, and vision pipelines.
RAG expertise beyond basics: Graph RAG, multi-hop retrieval, sub-agents, query planning, and freshness policies.
Safety and governance: policy-as-code, red-teaming, PII handling, audit logs, and role-based tool authorization.
Regulated data experience (HIPAA, SOC 2, GDPR) and data residency controls.
Personalization at inference time, long-term memory agents, session state, and episodic memory stores.
Experience with consumer-scale AI apps, high-traffic systems, or on-device/edge acceleration (WebGPU).

Responsibilities

Architect and build stateful, graph-based agent workflows with tool use, planning, and memory.
Integrate LLMs and multimodal models via structured I/O (JSON Schema, Pydantic validators) and function/tool calling.
Build high-reliability APIs and streaming services for real-time inference, speech, and vision.
Own production readiness: tracing, logging, metrics, rate limiting, circuit breakers, and SLOs.
Stand up eval pipelines: offline golden sets, LLM-as-judge with human rubrics, online A/B, and regression tests in CI.
Implement retrieval and memory: hybrid search, vector and graph retrieval, semantic caches, and long-horizon context.
Optimize cost/latency: model routing, prompt and tool selection, quantization, and KV cache/prefill strategies.
Partner cross-functionally to translate research into robust production systems and iterate quickly behind evaluation gates.
Mentor engineers through design docs and architecture decisions.