Senior Software Engineer

MicrosoftRedmond, WA

About The Position

Do you want to build the AI platform that trains the world’s most advanced models? Join CoreAI, where we’re building next-generation systems for large‑scale reinforcement learning. Our platforms power the full lifecycle of cutting‑edge LLMs – from rapid experimentation to global production deployment. On our team, you’ll play a pivotal role in making large‑scale training and reinforcement learning workflows faster, safer, and more reliable. You’ll build and evolve distributed services that underpin massive training runs, improve iteration loops for researchers and engineers, debug complex interactions between models and hardware, apply advanced performance optimization techniques that directly impact product quality and operational excellence, develop deep expertise in ML systems, AI infrastructure, and compute orchestration, and ship platform capabilities that enable mission‑critical AI workloads for customers around the world. Microsoft’s mission is to empower every person and every organization on the planet to achieve more, and we’re dedicated to this mission across every aspect of our company. Our culture is centered on embracing a growth mindset and encouraging teams and leaders to bring their best each day. Join us and help shape the future of the world.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • 5+ years of experience with ML/RL techniques, knowledge of open source LLMs, Kubernetes and containers, cloud services.

Responsibilities

  • Independently uses appropriate artificial intelligence (AI) tools and practices across the software development lifecycle (SDLC) in a disciplined manner. Takes responsibility for the content of their AI-generated requirements, design documents, code, and other assets, assisting other members of the team to do the same. Uses SDLC and engineering health measures (e.g., Accelerate, SPACE framework, Engineering System Success Playbook [ESSP]) to improve processes and practices, especially those involving AI. Experiments with AI tools and practices to improve their own capabilities.
  • Leads by example within the team to produce extensible, maintainable, well-tested, secure, and performant code that adheres to design specifications. Continuously improves code performance, testability, maintainability, effectiveness, and cost, while learning about and accounting for relevant trade-offs. Applies metrics to drive code quality and stability. Applies appropriate coding patterns and best practices (e.g., leveraging state-of-the-art generative artificial intelligence [GenAI], approaches to source code organization, naming conventions). Identifies and escalates blockers or unknowns during the development process, communicates how they will impact timelines, and identifies strategies and/or opportunities to address them.
  • Leads discussions for and owns the architecture of products/solutions and creates proposals for architecture by testing design hypotheses and developing complex design specifications. Tests and explores various design options for a complex product/solution scenario, outlining strengths and weaknesses of each option. Independently collaborates with architects to build and modify complex products/solutions, providing feedback as needed. Owns or collaborates with other engineers on the architecture of solutions, with minimal technical oversight. Develops design documents that support user stories and other product requirements. Maintains awareness of the current technology landscape and determines how to integrate these technologies within existing systems. Shares learnings and identified solutions from investigations with the team and owns some design decisions. Ensures system architecture and individual designs meet performance, scalability, resiliency, cost of goods sold (COGS), disaster recovery, and other requirements and expectations. Upholds Microsoft standards of security, privacy, and other compliance requirements and expectations. Understands and coaches less experienced engineers on the importance of building solutions that expand upon the work of others. Drives the refinement of products through data analytics and makes informed decisions in engineering products through data integration. Reviews designs/architectures within and across teams to provide recommendations for improvements.
  • Drives efforts to ensure the correct processes are followed to achieve a high degree of security, privacy, safety, and accessibility. Creates and assures the presence of visible evidence (e.g., audit trail) to demonstrate compliance for products. Develops and maintains a deep understanding of the implications of onboarding new technologies, following expectations of compliance at Microsoft. Demonstrates and maintains an up-to-date understanding of both global and local regulations for technologies and system applications to ensure regulations are followed and met.
  • Leverages internal experimentation infrastructures, drives experiments that determine the impact of changes using feature flags/flighting in their code. Collaborates with internal partners (e.g., Data Science, product managers) to incorporate success and guardrail metrics for experimentation.
  • Maintains operations of live site service, following security best practices when responding quickly to mitigate issues -- using only the minimum required permissions -- on a rotational, on-call basis. Implements solutions and mitigations to more complex issues impacting performance or functionality of live site service and escalates appropriately. Reviews and writes incident postmortems and presents insights that drive changes to reduce or eliminate incidents. Independently improves troubleshooting guides (TSGs), wikis, tests, and telemetry to make on-call better, and recommends user-facing support documentation and additional test coverage to reduce likelihood of future user-initiated incidents. Enables secure operations, security monitoring, and integration with live site investigation activities. Identifies and proposes opportunities (e.g., lunch talks, automation, practices, tools) that can be leveraged to improve the live site experience. Adds comprehensive observability and monitoring to services.

Benefits

  • Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service