AI Engineer

Varick AgentsSan Francisco, CA

About The Position

AI Engineers at Varick own the intelligence layer. You design, build, and optimize the agent systems that run inside enterprise operations — processing thousands of transactions, making classification decisions, routing exceptions, and learning from human feedback. This role is for engineers who have been deep in LLMs, agent architectures, and evaluation systems. You’ve built agentic workflows that run in production, not just demos. You understand prompt engineering, retrieval, tool calling, multi-agent orchestration, and the evaluation infrastructure required to ship AI systems that enterprises trust.

Requirements

  • 3+ years of software engineering with at least 1–2 years focused on LLM applications or AI systems in production
  • Hands-on experience building agentic workflows with tool calling, retrieval, and multi-step reasoning
  • Deep understanding of prompt engineering, context engineering, and how to get reliable behavior from LLMs
  • Experience building evaluation and quality systems for AI outputs
  • Strong Python skills and backend engineering fundamentals
  • You’ve shipped AI features to real users and dealt with the messy parts: hallucinations, edge cases, accuracy degradation, cost management
  • Based in SF.

Nice To Haves

  • Agent frameworks: LangGraph, CrewAI, Claude Code/Codex patterns, or custom orchestration
  • Retrieval systems: vector databases (Qdrant, pgvector, Pinecone), reranking, hybrid search
  • MCP, tool-calling protocols, and third-party API integrations
  • Fine-tuning, LoRA, or other model adaptation methods
  • Evaluation frameworks and continuous quality monitoring
  • Experience with enterprise AI deployments (compliance, audit trails, governance)
  • Prior work at AI labs, AI-native startups, or applied ML teams

Responsibilities

  • Design and build agent architectures for complex enterprise workflows (multi-step reasoning, tool calling, exception handling)
  • Build and maintain evaluation systems for agent quality, accuracy, safety, and groundedness
  • Design prompt systems, retrieval pipelines, and context engineering strategies for reliable agent behavior
  • Build the feedback loops that allow agents to learn from human corrections and improve over time
  • Optimize inference cost and latency for production workloads
  • Define best practices for agent reliability, observability, and governance
  • Stay current with the latest models, frameworks, and research — and ship what matters into production
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service