Senior Software Engineer, AI Platform

isolved•,

2d•Onsite

About The Position

The isolved Senior Software Engineer, AI Platform role owns both program execution and technical direction, leading ~20 engineers across domain teams (Tax, Benefits, Time, Payroll, Shared Logic), alongside two Engineering Managers and a Data Architect. The position blends delivery leadership with deep technical involvement, serving as a key decision-maker, escalation point for complex challenges, and ultimate owner of program outcomes.

Requirements

5+ years of professional software engineering experience, with Python as your primary language
2+ years building production LLM-powered systems - inference, RAG, agentic patterns, or AI infrastructure
Deep Python expertise - this is the primary language for AI platform work
Working proficiency in C#/.NET - the platform serves teams that live in C#, so interop is real and matters
Strong hands-on experience with agentic frameworks - Semantic Kernel, LangGraph, LangChain, or you've built your own
Production experience with RAG architecture: chunking strategies, embedding models, vector search, retrieval quality, and the failure modes that don’t show up in demos
Azure AI Foundry / Azure OpenAI experience - model deployment, API integration, observability tooling
Experience building internal platforms or SDKs that other engineers depend on - you understand what makes a platform feel good to use
Strong grasp of AI observability: token usage, latency, cost tracking, and distributed tracing across multi-agent workflows

Nice To Haves

Experience with TypeScript and building developer SDKs or tooling
Hands-on experience with AI evaluation frameworks (LLM-as-judge, automated regression testing)
Knowledge of AI governance practices, including access control, audit logging, and security safeguards
Familiarity with container-based deployments (e.g., Azure Container Apps) and infrastructure-as-code (Terraform)
Awareness of AI regulatory frameworks such as NIST AI RMF or ISO/IEC 42001

Responsibilities

Design and build a scalable LLM gateway with model routing, prompt management, cost attribution, rate limiting, and caching
Develop and operate RAG pipelines, embedding services, and vector search infrastructure for platform-wide use
Implement platform-level cost optimization strategies, including semantic caching and model selection by workload
Build and maintain agentic runtime infrastructure, including orchestration, state management, and human-in-the-loop patterns
Develop extensible MCP server and tool ecosystems for product team integration
Design and support multi-agent coordination patterns using modern frameworks and protocols
Establish comprehensive AI observability, including usage, latency, cost tracking, and distributed tracing
Implement AI governance controls, including access management, audit logging, content filtering, and security protections
Build AI incident detection and response capabilities, including monitoring for failures, hallucinations, and cost anomalies
Create developer-friendly SDKs across languages (Python, .NET, TypeScript) to simplify platform adoption
Define "paved road" patterns for common AI use cases and support onboarding of product teams
Build automated evaluation pipelines and continuously monitor production quality and model performance