About The Position

Joining the CoreAI organization at Microsoft means becoming part of the team that builds the end-to-end AI stack powering Azure’s innovation. As a member of the FIT training team within CoreAI, you will help develop the AI infrastructure that accelerates the creation of agentic AI systems across Microsoft. This role is dedicated to advancing scientific methods and scalable infrastructure for training agentic models to achieve frontier-level performance. You will contribute to LLMs, SLMs, and agentic models using both proprietary and open-source frameworks, all aimed at delivering reliable, enterprise-grade agentic workflows. We are seeking a curious, independent, adaptable problem-solver who thrives on continuous learning, embraces changing priorities, and is motivated by creating meaningful impact. Candidates must be able to lead and role model for team that is driven, able to write efficient code, debug complex training jobs, document findings, and demonstrate a track record of continuous improvement. In addition, we value an agile, startup-style mindset - someone who can iterate quickly, pivot when needed, and collaborate effectively in fast-paced, dynamic environments. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Requirements

  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Experience in distributed computing and architecture, and/or developing and operating high scale, reliable online services.

Nice To Haves

  • Knowledge and experience in Docker, Kubernetes, CI-CD pipelines and devops on micro-services running in Kubernetes clusters.
  • Experience in Rust programming languages.
  • Practical experience working on real-world applications that create or customize AI Agent to automate real-world tasks.
  • Experience in developing low latency systems.
  • Experience working in a geo-distributed team.
  • Understanding of parallel algorithms for communication between GPUs, familiarity with related libraries and frameworks such as DeepSpeed, PyTorch Distributed.
  • Knowledge of LLM model architectures e.g. GPT, Claude, DeepSeek etc.

Responsibilities

  • Engage directly with key partners to understand and implement complex inferencing and agentic capabilities for Microsoft Copilot and other Microsoft products and Azure services.
  • Design and implement API orchestration layer by leveraging OpenAI models, tools and capabilities.
  • Work on cutting edge agentic platforms and automate and solve real-world problems with latest and greatest reasoning AI models.
  • Work with cutting edge hardware stacks and a fast-moving software stack to deliver best of class inference and optimal cost.
  • Anticipate, identify, assess, track, and mitigate project risks and issues in a fast-paced start up like environment.
  • Motivated to build constructive and effective relationships and solve problems collaboratively.
  • Support production inference SLAs for core AI scenarios on one of the largest GPU fleets in the world.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service