Senior Machine Learning Engineer (Agent Systems)

eCue

77d•Remote

About The Position

e:cue is a fast-paced, high-growth startup building custom AI analysts for leaders in marketing, finance, and revenue. Our platform combines production-grade application services, cloud infrastructure, and agent systems that power high-stakes business decisions. We're looking for a senior engineer who can take work from ticket to outcome: scope it, build it, ship it, and own it in production. This role owns core parts of the agent stack, deciding how agents plan and execute, how they interact with data, and how we evaluate and improve them over time. You'll work across: Agent systems: planning, tool use, multi agent orchestration, long-context workflows Backend and infrastructure: agent services, data pipelines, and observability Evaluation and post-training: Designing evaluation harnesses, feedback loops, datasets, and improving agent behavior

Requirements

Experience building or working with LLM-powered systems
Familiarity with Agents, tool use, or structured reasoning systems
Experience with ML evaluation systems for ambiguous objectives
Ability to own problems end-to-end
Strong product intuition

Nice To Haves

Experience with ML systems or training workflows, finetuning (SFT, DPO, RLHF, etc.), dataset construction and evaluation pipelines
Experience building agent frameworks for tool-using LLMs for long-context or retrieval-heavy workflows
Familiarity with modern inference, frontier APIs, and serving stacks (vLLM, SGLang, or similar)
Experience at a startup owning large systems independently

Responsibilities

Design and build production agent systems: Tool execution frameworks (MCP servers, sandbox environments, tool architectures), Planning and reasoning pipelines, Context and dependency aware agent execution
Own services that power production agents: Reliability, latency, and scaling improvements, Observability integrity (logging, tracing, evaluation hooks for offline and online evaluation)
Develop evaluation and feedback systems: Define metrics for agent performance (offline and online), Own evaluation harnesses and test suites, Instrument systems to generate high-quality evaluation and training data
Contribute to post-training and model improvement: Dataset generation (trajectory collection, preference data), Fine-tuning (SFT, DPO, etc.) for modules where context engineering isn't enough, Prompting and system design for better reasoning and context management