About The Position

We are looking for a Principal Software Engineer to join our team to drive all aspects of AI feature fundamentals for one of the biggest modern collaboration platforms in the world - Microsoft Teams. We help feature teams ship quality AI experiences out of the gate, track key performance and reliability metrics for critical high-volume scenarios, aid feature teams in improving the debuggability of AI scenarios, help create offline and online evals for all AI features by incorporating into release pipelines and drive culture of performance by promoting best practices and consulting. As a team, we obsess about learning, diving deep into areas of opportunities, experimenting and use an evidence-based approach to turning opportunities into positive impact on performance of the product through collaboration. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • 2+ years of experience on engineering tooling or eval development
  • 2+ years experience in working on services at scale.
  • 1+ years experience in driving fundamentals for AI features within web apps.

Nice To Haves

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Prior experience in driving fundamentals for AI features within web apps.
  • Understanding of building engineering tools on the server side for scale.
  • Prior experience working with building AI workflows is a plus.Prior experience in working closely with AI feature teams and improving fundamentals like performance and reliability is a major plus.
  • Experience solving challenging problems and cross team/organization collaboration skills.
  • Proficiency with React.
  • Curiosity to dive deep, continuously learn and experiment.
  • Passion for collaboration and knowledge sharing.

Responsibilities

  • Define the vision, strategy, and roadmap for how to evaluate AI features for good fundamentals at scale across Teams.
  • Lead end-to-end science and technical design for evaluating LLM-powered agents on real-time and batch workloads: designing evaluation frameworks, metrics, and pipelines that capture planning quality, tool use, retrieval, safety, and end-user outcomes, and partnering with engineering for robust, low-latency deployment.
  • Establish rigorous evaluation and reliability practices for LLM/agent systems: from offline benchmarks and scenario-based evals to online experiments and production monitoring, defining guardrails and policies that balance quality, cost, and latency at scale.
  • Collaborate with PM, Engineering, and UX to translate evaluation insights into customer-visible improvements, shaping product requirements, de-risking launches, and iterating quickly based on telemetry, user feedback, and real-world failure modes.
  • Collaborate and mentor across product, research, and engineering teams, sharing best practices on eval design, LLM-as-judge usage, and Responsible AI, and providing code reviews and guidance that raise the bar for the AI features.
  • Provide technical leadership and mentorship within the applied science and engineering community, fostering inclusive, responsible-AI practices in agent evaluation, and influencing roadmap, platform investments, and cross-team evaluation strategy across Fabric.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service