About The Position

Join Microsoft’s CoreAI – AI Platform team in Bay Area/Redmond to build the AI Data Platform, the foundation for secure, scalable, reusable datasets that power model development. We seek Software Engineers passionate about large-scale data infrastructure, automation tools, and intelligence services to transform how Microsoft collects, generates, manages, and shares AI training data. Our mission is to build a central AI data platform that breaks down Microsoft’s data silos and manages the full lifecycle of first-party, third-party, synthetic, and human-labeled data, accelerating AI model development with secure, reusable, and compliant datasets. The Software Engineers in AI Data Platform are responsible for large-scale data infrastructure, automation tools, and intelligence services to transform how Microsoft collects, generates, manages, and shares AI training data.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years

Nice To Haves

  • M.S. or Ph.D. in Computer Science, Engineering, or related field OR equivalent practical experience.
  • 4+ years of experience in software engineering.
  • Proficiency in one or more programming languages (e.g., C#, Java, Python).
  • Experience with distributed systems, cloud services (Azure, AWS, or GCP), or large-scale data pipelines.
  • Experience with data lifecycle management (e.g., ingestion, validation, discovery, governance).
  • Knowledge of privacy, compliance, and security practices in large-scale data platforms.
  • Familiarity with AI/ML workflows, training data preparation, and LLM-based synthetic data generation for training, reward modeling and agents.
  • Strong problem-solving, communication, and collaboration skills

Responsibilities

  • Design and build scalable data pipelines and services to automate the dataset lifecycle (ingestion, registration, validation, PII handling, discovery, sharing, lineage), including intelligent agent-driven automation for key stages.
  • Develop secure and reliable infrastructure for data access, entitlement management, and operational support across global time zones.
  • Implement governance and compliance tooling to ensure data integrity, auditability, and adherence to regulatory standards.
  • Create user-facing tools and APIs that make datasets easily discoverable and reusable.
  • Contribute to strategic extensions such as continuous feedback loops, human-in-the-loop workflows, and data intelligence services for internal and external stakeholders.
  • Collaborate with cross-org partners to align priorities and deliver company-wide impact.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service