Principal AI Researcher (Agentic Systems & AI Infrastructure)

Trase Systems•McLean, VA

1d•$250,000 - $300,000

About The Position

As a Principal AI Researcher, you will define and drive the long-term research direction for the Trase operating system, the agentic execution platform powering autonomous systems in regulated environments. This role sits at the intersection of frontier AI research, agentic systems, orchestration infrastructure, and production deployment, with a focus on how models behave inside real-world execution environments rather than solely on offline benchmark performance. You will lead research across areas such as agent workflows, tool use, long-lived execution, orchestration, and autonomous system reliability, while conducting large-scale experimentation and advancing novel approaches in applied AI systems. This is a hands-on technical leadership role operating across research, systems, and product. You will drive technical breakthroughs in agentic infrastructure and applied AI systems, own the end-to-end research-to-production lifecycle, and work closely with engineering and product teams to translate frontier research into scalable, production-grade systems deployed across Trase. Trase OS coordinates long-lived agents, tool-augmented LLMs, multi-agent workflows, and execution in regulated enterprise environments. As these systems scale, the core challenge shifts from raw model capability to system correctness, orchestration reliability, infrastructure governance, and safe autonomous execution. We are particularly interested in candidates with expertise or research interest in areas such as: agent-to-agent learning, orchestration and harness engineering, infrastructure governance for AI operating systems, long-lived execution and memory systems, SLMs (small language models), model optimization, and fine-tuning recipes, post-training adaptation techniques and model behavior shaping, and evaluation frameworks for autonomous agents. This role will help define how next-generation AI systems are researched, evaluated, and safely operated in production.

Requirements

12–15+ years of experience in machine learning, AI systems, or applied AI research, including experience operating at a Principal, Distinguished, or equivalent technical level.
Strong research and publication track record, including authored papers, major technical contributions, or active participation in frontier AI research.
Experience publishing at top-tier conferences or contributing influential open-source, research, or AI infrastructure systems.
Experience conducting large-scale experimentation requiring significant compute infrastructure, evaluation workflows, and iterative model/system analysis.
Deep expertise in one or more areas including agentic systems, LLMs and generative AI, multi-agent systems, reasoning systems, reinforcement learning, orchestration infrastructure, AI systems reliability, NLP, multimodal systems, or deep learning.
Hands-on experience with agent-based systems, prompt engineering, RAG, RLHF, SLMs, fine-tuning/post-training techniques, tool integration, memory systems, and human-in-the-loop orchestration.
Proven experience building, deploying, and operating enterprise-grade AI systems, including GenAI, LLM, or agent-based applications at scale.
Strong understanding of ML system behavior in production, including reliability, latency, cost tradeoffs, observability, evaluation frameworks, regression testing, and failure modes.
Strong systems thinking and demonstrated ability to partner cross-functionally with engineering and product organizations to move research into production systems.
Strong programming and prototyping skills in Python and modern ML infrastructure stacks, with experience in Java or related systems languages preferred.
Experience deploying AI/ML systems in regulated, constrained, or enterprise environments, and demonstrated ability to lead technical direction from research through production impact.

Nice To Haves

PhD in Computer Science, Machine Learning, AI, Systems, or a related field.
Experience building and operating AI/ML platforms supporting the full model lifecycle, including training, evaluation, deployment, and monitoring.
Experience optimizing ML inference or orchestration systems in real-time, distributed, or resource-constrained environments.

Responsibilities

Define and evolve the long-term AI/ML research strategy and technical roadmap for Trase OS in alignment with product and platform direction.
Lead large-scale experimentation and prototyping efforts requiring significant compute infrastructure, translating frontier AI research into scalable, production-grade systems with measurable impact.
Drive original research and technical breakthroughs in agentic systems, autonomous execution, multi-agent orchestration, post-training and fine-tuning systems, SLM/LLM-based architectures, and applied AI infrastructure.
Design how models operate within long-lived execution environments, including agent workflows, tool use, planning, memory systems, reasoning, and human-in-the-loop controls.
Establish evaluation methodologies and reliability frameworks for autonomous systems, including benchmarking, regression testing, safety, controllability, and production behavior analysis.
Drive architecture decisions across orchestration, model serving, routing, inference, and infrastructure governance, including latency, reliability, and cost optimization.
Partner closely with engineering and product teams to operationalize research outcomes into deployable systems and enterprise workflows.
Build AI systems that operate reliably in regulated and constrained environments, including secure cloud, on-premise, and air-gapped deployments.
Contribute to the broader AI research community through technical papers, publications, conference participation, architecture proposals, and thought leadership.
Serve as a senior technical authority and mentor across the organization, influencing technical direction, research rigor, experimentation practices, and best practices across research, engineering, and product teams.

Benefits

Career track opportunity with potential for rapid advancement with strong performance as the firm grows
100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
Paid maternity and paternity for 14 weeks at employees' normal pay.
Unlimited PTO, with management approval.
Opportunities for professional development and continued learning.
Optional 401K, FSA, and equity incentives available.
Mental health benefits are available through Tara Mind.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume