Software Engineer III

PearsonDurham, NC
9h

About The Position

AgentOps is the enterprise engineering foundation for building, operating, and governing AI agents and digital workers as production-grade systems. We are enabling the shift from simple chat-based experiences to agentic systems that can reason, plan, use tools, and execute complex workflows reliably across the enterprise. Our mission is to provide the platform capabilities, reusable skills, and operational controls required to scale intelligent digital workers with strong standards for reliability, security, observability, and compliance. THE ROLE As a Software Engineer III on the Agent Engineering team, you will design and build core platform capabilities that power intelligent, stateful, and production-ready agents, digital workers, and reusable skills. This is a hands-on senior engineering role focused on orchestration, agent runtime patterns, resilience, memory, retrieval, and observability. You will help define reusable engineering patterns for how digital workers are built, how skills are packaged and reused, and how agentic workflows are operated across the platform. You will work closely with partner teams to translate complex business workflows into robust, governed, and scalable agentic services.

Requirements

  • 8+ years of software engineering experience with strong proficiency in Python and backend/platform engineering.
  • Hands-on experience building LLM-powered systems, agents, digital workers, or workflow automation platforms in production.
  • Experience with frameworks such as LangGraph, CrewAI, AutoGen, LangChain, LlamaIndex, or similar.
  • Strong experience in APIs, distributed systems, cloud-native engineering, and production reliability.
  • Experience designing and integrating RAG pipelines, tool-calling systems, reusable skills, and structured output patterns.
  • Experience with at least one major cloud platform such as AWS, Azure, or GCP, along with Docker, Kubernetes, and CI/CD practices.
  • Ability to design systems with strong trade-off awareness across quality, latency, cost, resilience, and maintainability.

Nice To Haves

  • Experience with MCP or similar tool/context interoperability protocols.
  • Experience with Redis, DynamoDB, Postgres, or workflow/state stores for orchestration and persistence.
  • Familiarity with multi-agent systems, digital worker architectures, skill registries, and human-in-the-loop execution models.
  • Experience with AI observability, evaluation frameworks, and operational telemetry for LLM systems.
  • Understanding of secure execution patterns, sandboxing, and prompt injection mitigation.
  • Ability to translate emerging research and ecosystem patterns into pragmatic production solutions.

Responsibilities

  • ADVANCED ORCHESTRATION & DIGITAL WORKER EXECUTION Design and implement multi-agent and digital worker orchestration patterns that enable specialized agents to delegate, collaborate, and complete multi-step business goals.
  • Build stateful and cyclic workflows using frameworks such as LangGraph, CrewAI, AutoGen, or similar, enabling reflection, recovery, and adaptive execution beyond simple linear chains.
  • Develop reusable orchestration components for routing, retries, fallback logic, structured outputs, and human-in-the-loop interventions.
  • Define how digital workers compose and invoke reusable skills across common enterprise workflows.
  • SKILLS, TOOLING & INTEROPERABILITY Build and maintain reusable skills that encapsulate business actions, domain logic, tool usage, and workflow steps in a standardized way.
  • Define contracts and standards for how skills are exposed, discovered, versioned, and consumed by agents and digital workers.
  • Contribute to standards for MCP, tool calling, and agent interaction contracts across the platform.
  • Integrate enterprise APIs, services, and data systems into reusable skills with strong attention to safety, governance, and maintainability.
  • STATEFUL EXECUTION, RELIABILITY & AGENT RUNTIME ENGINEERING Design systems for long-running, resumable workflows for agents and digital workers, including checkpointing, persistence, context restoration, and lifecycle management.
  • Implement resilience patterns for non-deterministic AI systems, including timeout handling, intelligent retries, degraded execution modes, and escalation paths.
  • Improve runtime reliability, scalability, and cost efficiency of agent and digital worker workloads in production.
  • Partner with infrastructure and platform teams to harden execution across cloud-native environments.
  • RAG, MEMORY & KNOWLEDGE-AUGMENTED INTELLIGENCE Build and optimize retrieval-augmented generation pipelines using vector databases, hybrid retrieval, re-ranking, and grounding strategies.
  • Design memory patterns that improve continuity and contextual relevance across agent and digital worker sessions, including short-term, episodic, and semantic memory approaches.
  • Integrate enterprise knowledge sources and structured systems securely into workflows and skills.
  • Evaluate and improve answer quality, retrieval performance, and contextual fidelity.
  • EVALUATION, GUARDRAILS & OBSERVABILITY Build automated evaluation frameworks to measure workflow quality, skill execution quality, tool-use accuracy, groundedness, safety, and task success.
  • Instrument deep tracing and operational observability using tools such as Langfuse, LangSmith, Arize Phoenix, OpenTelemetry, or similar.
  • Define and monitor engineering KPIs such as latency, cost per run, fallback rates, workflow completion success, skill reliability, and production health.
  • Contribute to guardrails for safe execution, prompt injection resistance, and policy-compliant agent behavior.
  • TECHNICAL LEADERSHIP & PLATFORM CONTRIBUTION Drive reusable engineering standards, shared libraries, and reference patterns for agent development, digital workers, and skills across the platform.
  • Mentor other engineers through design reviews, code reviews, and implementation guidance.
  • Partner with product, architecture, and domain teams to shape scalable solutions for enterprise use cases.
  • Stay current on the evolving agentic AI ecosystem and evaluate new frameworks, techniques, and runtime patterns pragmatically for enterprise adoption.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service