Principal Applied ML Researcher (Agentic Systems & Applied AI Platform)

Red Cell Partners•Seattle, VA

1d•$230,000 - $300,000

About The Position

As a Principal Applied ML Researcher, you will define and drive the ML and LLM strategy for Trase OS, the agentic execution platform powering deployments in regulated environments. You are responsible for how models behave inside real production systems - including agent workflows, tool use, and long-lived execution -not just offline model performance. This is a hands-on technical leadership role operating at the intersection of research, systems, and product. You will drive technical breakthroughs in agentic infrastructure and applied AI systems, own the end-to-end research-to-production lifecycle, and set the standard for how ML systems are designed, evaluated, and deployed across Trase. Trase OS coordinates long-lived agents, multi-step workflows, tool-augmented LLMs, and execution in regulated environments. As the system scales, the core challenge shifts from model capability to system correctness and reliability, where models may succeed offline but fail in real workflows, agent behavior can become unpredictable or unsafe, evaluation can drift from real outcomes, and ML decisions can introduce system-level instability. This role defines how ML systems are integrated into execution systems, evaluated end-to-end, and operated reliably in production.

Requirements

12–15+ years of experience in machine learning, including building and deploying applied ML systems in production environments.
Strong programming skills in Python, with experience in Java, C++, or related languages in systems contexts.
Deep expertise in at least one major ML domain, such as LLMs and generative AI, NLP or multimodal systems, deep learning, or graph learning.
Hands-on experience with prompt engineering, multi-agent orchestration, tool integration via APIs, memory management, and human-in-the-loop system design.
Proven experience building and shipping enterprise-grade AI systems, including GenAI, LLM, or agent-based applications at scale.
Experience designing and implementing evaluation frameworks, including metrics, benchmarks, and testing systems.
Strong understanding of ML system behavior in production, including reliability, latency, cost tradeoffs, and failure modes.
Experience deploying ML systems in regulated or constrained environments and familiarity with modern ML infrastructure such as cloud platforms and containerized systems.
Demonstrated ability to lead technical direction across teams and drive systems from concept to production impact.

Nice To Haves

Experience working with agent-based systems, retrieval-augmented generation, RLHF, or synthetic data approaches.
Experience building and operating AI/ML platforms that support the full model lifecycle.
Experience optimizing ML inference in real-time, distributed, or resource-constrained environments.
Familiarity with data privacy, compliance, and responsible AI frameworks relevant to regulated industries.
Prior experience operating at the Staff or Principal level in a scaling organization.

Responsibilities

Technical Leadership & Innovation Drive technical breakthroughs in agentic systems, applied ML infrastructure, and LLM-based applications. Define and evolve the ML/LLM strategy and technology roadmap in alignment with product development. Act as a principal technical authority, making high-impact architectural and modeling decisions across teams.
Research → Prototype → Production Develop prototypes for key technologies to validate new approaches and de-risk system design. Own the full lifecycle from research and experimentation through production deployment, monitoring, and iteration. Translate advances in ML into scalable, production-grade systems with measurable impact.
Agentic Systems & Applied ML Design how LLMs operate within agent workflows, tool use, and multi-step reasoning and long-lived execution. Implement and refine prompting strategies, multi-agent orchestration, memory management, and human-in-the-loop controls for safety and reliability. Establish patterns for planning, decision-making, and tool orchestration within complex systems.
Evaluation, Quality & Reliability Own end-to-end quality evaluation of ML-powered systems, including defining metrics, benchmarks, and testing frameworks. Establish evaluation systems that connect model performance to task success and system-level outcomes. Ensure systems behave predictably, safely, and reliably in production through monitoring, regression testing, and robust failure handling.
ML Systems & Platform Integration Contribute to the design of ML systems supporting the full lifecycle, including training, fine-tuning, evaluation, deployment, and monitoring. Drive architecture decisions across model serving, routing, orchestration, and latency and cost optimization. Work across infrastructure layers, including cloud and containerized systems, to ensure scalable and efficient deployment.
Enterprise & Regulated Deployments Build and deploy enterprise-grade AI systems used by global customers in production environments. Design systems that operate reliably in regulated and constrained settings, including on-premise, air-gapped, and secure cloud environments. Ensure systems are auditable, explainable, and compliant with regulatory and organizational requirements.
Communication & Influence Write technical reports and design documents summarizing R&D progress, system behavior, and key decisions. Communicate complex ML concepts and tradeoffs clearly to both technical and non-technical stakeholders. Drive alignment across research, engineering, and product through strong technical leadership.
Mentorship & Organizational Impact Mentor junior and senior engineers and researchers, raising the bar for ML rigor and system-level thinking. Establish and propagate best practices for ML system design, evaluation, and reliability across the organization. Influence technical direction beyond immediate teams through high-impact, cross-functional work.

Benefits

Career track opportunity with potential for rapid advancement with strong performance as the firm grows
100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
Paid maternity and paternity for 14 weeks at employees' normal pay.
Unlimited PTO, with management approval.
Opportunities for professional development and continued learning.
Optional 401K, FSA, and equity incentives available.
Mental health benefits are available through Tara Mind .
Cost effective GLP-1 solutions available through Crux .

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume