About The Position

We're rolling out AI agents that do real work across the organization — offloading administrative and operational tasks in Sales, Marketing, Customer Support, and Ops. We've already built a plugin marketplace with 28 agent plugins, 100+ skills, custom CLI tooling, and an eval framework. We need someone who can build new agents, harden what exists, and coach the rest of the team to build their own. This is not an R&D sandbox. You will be measured by what ships, reliability in production, adoption by the team, and — critically — whether others can build and maintain agents without you.

Requirements

  • 4+ years professional software engineering experience (backend, integrations, automation, platform).
  • Production coding experience in Python and/or TypeScript.
  • Hands-on experience building AI-enabled applications (LLM apps, tool-using agents, or workflow automation) with a focus on reliability and evaluation.
  • Strong prompt engineering skills: ability to write system prompts, skill definitions, and eval rubrics that produce consistent, high-quality agent behavior.
  • Strong testing and ops discipline: unit/integration tests, monitoring/logging, and incident response.
  • Demonstrated ability to teach and coach — whether through mentoring, workshops, pair programming, or documentation. You should enjoy making others more capable, not just shipping your own work.
  • Experience building and shipping backend systems / web services
  • Comfort with APIs, auth (OAuth/service accounts), permissions/RBAC, and secrets management
  • Understanding of system design tradeoffs: latency/cost, scalability, reliability, and failure modes
  • Comfortable with Docker and containerized deployments (for CLI tools and supporting infra)
  • Experience with CI/CD pipelines and production deployment workflows

Nice To Haves

  • Experience with Claude Code (plugin authoring, skill design, subagent orchestration) or deep familiarity with Anthropic's tool-use patterns.
  • Experience building evaluation pipelines for LLM/agent quality (task success, groundedness, hallucination rate, context faithfulness).
  • Familiarity with Promptfoo or similar eval frameworks for output-quality testing.
  • Experience building and maintaining CLI tools (Python/Typer, Click, or similar) as integration primitives.
  • Experience integrating with CRM/helpdesk/BI systems (e.g., HubSpot, Zendesk, Snowflake, Google Workspace APIs).
  • Experience in regulated environments (healthcare/pharma) with auditability, data minimization, and access controls.
  • Docker experience for containerizing CLI tools and supporting services.

Responsibilities

  • Workflow discovery → agent design → build → test → deploy → monitor → iterate
  • Tool integrations (CRM, helpdesk, BI, docs, comms) via lightweight CLI tools that agents invoke as primitives
  • Quality + safety standards that prevent trust-breaking failures
  • Production operations: evals, logging/traceability, dashboards, incident response, and regression prevention
  • A repeatable agent factory (templates, shared skills, reusable connectors, scaffolding tools) that increases throughput without sacrificing quality
  • Team enablement: coaching staff across all functions to discover, spec, build, and maintain their own agents
  • Shadow functional teams, map workflows, and identify the highest-leverage admin tasks to automate.
  • Turn those into a tight sequence of releases: MVP → v1 → v2.
  • Translate business workflows into agent specifications through collaborative discovery with non-technical stakeholders.
  • Implement agents using Claude Code's plugin architecture: agent identity files, SKILL.md skill definitions, subagent orchestration, and tool-use patterns.
  • Write clear, structured prompts (system prompts, skill instructions, eval rubrics) that produce reliable, repeatable agent behavior.
  • Build agents that run both: Attended mode (human-in-the-loop approvals, confidence cues) Autonomous mode (policy-based execution, safe escalation, auditable actions)
  • Build and maintain lightweight Python/Typer CLI tools that serve as the connective tissue between agents and business systems (CRM, ticketing, BI/warehouse, knowledge base, email/calendar).
  • Design clean tool interfaces that are both human-usable at the terminal and agent-friendly via tool-use declarations.
  • Write and maintain production code in Python and/or TypeScript.
  • Design for reliability: idempotency, retries/backoff, rate limiting, timeouts, and graceful degradation.
  • Define and implement evals: golden-set test cases, regression suites, fixture-based grounding checks, and launch checklists using Promptfoo or similar frameworks.
  • Write eval rubrics and assertion layers that catch hallucination, format violations, and instruction drift.
  • Debug prompt-level issues — not just code bugs, but behavioral regressions in agent output.
  • Implement observability: structured logs, traces, tool-call auditing, failure clustering, and per-agent health dashboards.
  • Triage production issues, run postmortems, and prevent repeat failures through tests and guardrails.
  • Run hands-on workshops that take non-technical staff from "I have a repetitive task" to "I have a working agent."
  • Pair with team members across functions to co-build agents — not just build for them.
  • Create and maintain playbooks, templates, and guardrails that lower the bar so anyone on the team can ship an agent safely.
  • Establish patterns and conventions that make the agent ecosystem self-service over time.
  • Communicate agent capabilities and limitations honestly — no vapor, no overpromising.
  • Deliver workflow-native entry points (Slack commands, CRM buttons, ticket macros, internal UI).
  • Document runbooks and "how to trust this" guidance based on real capability.
  • Measure adoption and iterate based on usage data, not assumptions.

Benefits

  • 401(k) w/matching
  • all kinds of insurance (including matching HSA and pets!)
  • commute from your kitchen
  • Open PTO (which leaders use!)
  • remote stipend
  • yearly education budget
  • working with some of the smartest yet humblest and respectful people in the business
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service