AI Engineer

Varick Agents•San Francisco, CA

72d

About The Position

AI Engineers at Varick own the intelligence layer. You design, build, and optimize the agent systems that run inside enterprise operations — processing thousands of transactions, making classification decisions, routing exceptions, and learning from human feedback. This role is for engineers who have been deep in LLMs, agent architectures, and evaluation systems. You’ve built agentic workflows that run in production, not just demos. You understand prompt engineering, retrieval, tool calling, multi-agent orchestration, and the evaluation infrastructure required to ship AI systems that enterprises trust.

Requirements

3+ years of software engineering with at least 1–2 years focused on LLM applications or AI systems in production
Hands-on experience building agentic workflows with tool calling, retrieval, and multi-step reasoning
Deep understanding of prompt engineering, context engineering, and how to get reliable behavior from LLMs
Experience building evaluation and quality systems for AI outputs
Strong Python skills and backend engineering fundamentals
You’ve shipped AI features to real users and dealt with the messy parts: hallucinations, edge cases, accuracy degradation, cost management
Based in SF.

Nice To Haves

Agent frameworks: LangGraph, CrewAI, Claude Code/Codex patterns, or custom orchestration
Retrieval systems: vector databases (Qdrant, pgvector, Pinecone), reranking, hybrid search
MCP, tool-calling protocols, and third-party API integrations
Fine-tuning, LoRA, or other model adaptation methods
Evaluation frameworks and continuous quality monitoring
Experience with enterprise AI deployments (compliance, audit trails, governance)
Prior work at AI labs, AI-native startups, or applied ML teams

Responsibilities

Design and build agent architectures for complex enterprise workflows (multi-step reasoning, tool calling, exception handling)
Build and maintain evaluation systems for agent quality, accuracy, safety, and groundedness
Design prompt systems, retrieval pipelines, and context engineering strategies for reliable agent behavior
Build the feedback loops that allow agents to learn from human corrections and improve over time
Optimize inference cost and latency for production workloads
Define best practices for agent reliability, observability, and governance
Stay current with the latest models, frameworks, and research — and ship what matters into production

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume