Senior AI Platform Engineer - HexCore & Eval Systems - OPS00071

Dev.Pro

About The Position

At Dev.Pro, we are seeking a Senior AI Platform Engineer to own the core platform layer that powers every EasyBee AI agent in production. This role involves managing multi-tenant agent configuration, schema architecture, data pipeline contracts, evaluation harnesses, and customer onboarding automation. It sits at the intersection of backend platform engineering, LangGraph-based orchestration, and AI evaluation systems. The engineer will own the infrastructure for agent orchestration, customer configuration schema, conversation logging, automated eval pipelines, and deployment scripts. The ideal candidate loves owning systems that other engineers depend on, ships at high velocity, and improves codebases.

Requirements

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
Proficient or Advanced use of agentic workflows for coding in tools like Cursor AI or Claude Code.
4+ years building and owning production-grade backend systems in Python.
Proven experience owning a core platform or shared infrastructure layer used by multiple teams or customers.
Hands-on track record with multi-tenant system design — schema isolation, config-driven parameterization, and deployment automation.
Experience building evaluation harnesses for LLM-based systems with quantitative metrics.
Python (advanced): async I/O, FastAPI, Pydantic, pytest, type hinting, data classes.
LangGraph: state machines, conditional edges, node composition, shared state management.
PostgreSQL + pgvector: relational schema design, state persistence, multi-tenant data isolation.
RAG pipelines: vector DB (Pinecone or equivalent), embedding pipelines, retrieval evaluation.
Eval & tracing frameworks: LLM simulation testing, distributed tracing, automated scoring pipelines.
GitHub Actions / CI/CD: automated eval gates, schema validation hooks, environment promotion.
AWS: EC2, S3, RDS, IAM — production deployment and infrastructure operations.
YAML / config-driven deployment: customer configuration templating, parameterized onboarding scripts.
Strong systems thinking.
Comfort owning a wide surface area.
High individual shipping velocity.
Strong schema discipline.
Ability to work autonomously with minimal supervision.
Strong written communication.

Nice To Haves

Experience with IP-aware architecture decisions or contributing to software patent documentation.
Familiarity with voice agent systems (Twilio, PSTN, LiveKit) and latency-constrained deployments.
Experience with multi-model evaluation (comparing models from OpenAI, Anthropic, Mistral) using quantitative benchmarks.
Prior work in self-storage, property management, or regulated verticals where data privacy and auditability matter.
Experience contributing to a modular / clean architecture codebase across multiple bounded contexts.
Prior experience in fast-growing startups where you owned infrastructure other engineers depended on daily.

Responsibilities

Own and evolve the core platform repository, implementing modular agent architecture across orchestration, tools, state, retrieval, configuration, and extensibility layers.
Design and maintain customer configuration schemas, including versioning metadata, lineage tracking, and component provenance.
Implement backward-compatible schema extensions and ensure upgrades without breaking changes.
Enforce schema validation at all node inputs/outputs to prevent data drift in multi-tenant environments.
Build and maintain cross-client isolation for customer configuration, persistent state, and RAG pipelines.
Implement multi-tenant tagging for clean separation of conversation logs, eval datasets, and agent behaviors.
Design config-driven deploy parameterization for configuration-only deployment of new customers.
Ensure all platform changes are backward compatible.
Own the end-to-end conversation logging system, including schema, capture, and metadata persistence to PostgreSQL and S3.
Maintain and extend knowledge base ingestion pipelines: scraping, embedding, vector DB indexing, and retrieval validation.
Define and freeze data contracts between capture specifications and implementation.
Implement multi-tenant data tagging for accurate attribution of logged conversations.
Own the eval suite end-to-end: scenario design, ground-truth dataset curation, automated scoring, and regression CI gates.
Build and maintain LLM simulation test flows to exercise agents across various functionalities.
Instrument distributed tracing at the LangGraph node level.
Implement eval suite parameterization for cross-customer use with minimal configuration.
Define and enforce production-ready gates (eval score thresholds) before agent deployment.
Build and maintain onboarding automation scripts for rapid customer deployment (under 30 minutes).
Own deploy parameterization, ensuring customer-specific values are injected via config.
Maintain platform sync across customer repositories, ensuring shared code consistency.
Document and enforce the deployment SOP for new deployments.
Ensure all platform APIs meet latency targets through profiling, caching, and async optimization.
Maintain structured logging at critical path nodes.
Implement CI/CD gates for automated eval and schema validation before merging to production.
Contribute to incident diagnosis by maintaining observable, well-logged systems.