Member of Technical Staff - Principal Data Infrastructure Engineer

Microsoft•Redmond, WA

37d•Hybrid

About The Position

As Microsoft continues to push the boundaries of AI, we are on the lookout for passionate individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad — to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It’s also inclusive: we aim to make AI accessible to all — consumers, businesses, developers — so that everyone can realize its benefits. We’re looking for a Member of Technical Staff - Principal Data Infrastructure Engineer. This role is a dynamic blend of Platform Engineering, DevOps/SRE, and Big Data Infrastructure Engineering, focused on enabling large-scale data and ML pipelines and intelligent systems. If you’ve architected big data platforms from the ground up and are eager to apply that expertise to consumer AI, we want to hear from you. You’ll bring: Deep technical expertise A passion for automation and observability Fluency in distributed systems Creativity to design scalable solutions And just as importantly: empathy, collaboration, and a growth mindset Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50- mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction.

Requirements

Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience.

Nice To Haves

4+ years in Big Data Infrastructure, DevOps, SRE, or Platform Engineering.
3+ years of hands-on experience managing and scaling distributed systems—from bare-metal to cloud-native environments.
2+ years deploying containerized applications using Kubernetes and Helm/Kustomize.
Solid scripting and automation skills using Python, Bash, or PowerShell.
Proven success in CI/CD pipeline management, release automation, and production troubleshooting.
Experience working with Databricks for scalable data processing and analytics.
Familiarity with security practices in infrastructure environments, including IAM, OAuth, and Kerberos administration.
Proven experience with cloud-native infrastructure across Azure, AWS, or GCP.
Hands-on expertise with modern data platforms like Databricks, including:
Deep understanding of data storage and processing technologies:
Relational & NoSQL databases
Key-value stores.
Spark compute engines.
Distributed file systems (e.g., HDFS, ADLS Gen2).
Messaging systems (e.g., Event Hub, Kafka, RabbitMQ).
Capacity planning and incident management for large-scale big data systems.
Solid collaboration history with Data Engineers, Data Scientists, ML Engineers, Networking, and Security teams.
Familiarity with modern web stacks: TypeScript, Node.js, React, and optionally PHP.
Exposure to agentic workflows, deep learning, or AI frameworks.
Practical experience integrating LLMs (e.g., GPT-based models) into daily workflows—automating documentation, code generation, reviews, and operational intelligence.
Solid grasp of prompt engineering techniques to design, optimize, and evaluate interactions with LLMs.
Demonstrated ability to troubleshoot and resolve complex performance and scalability issues across infrastructure layers.
Excellent interpersonal and communication skills, with a solid passion for mentorship and continuous learning.
Experience applying LLMs to DevOps workflows, enhancing incident response, and streamlining cross-functional collaboration is a solid advantage.

Responsibilities

Architect and maintain scalable, reliable, and observable Big Data Infrastructure for mission-critical AI applications.
Champion DevOps and SRE best practices—automated deployments, service monitoring, and incident response.
Build a self-service big data platform that empowers data and platform engineers and researchers.
Develop robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code tools (Bicep, Terraform, ARM).
Collaborate with Data Engineers, Data Scientists, AI Researchers, and Developers to deliver secure, seamless big data workflows.
Lead technical design reviews and uphold a clean, secure, and well-documented codebase.
Proactively identify and resolve bottlenecks in data pipelines and infrastructure.
Optimize system performance across storage, compute, and analytics layers.
Partner with Security teams to enhance system security (IAM, OAuth, Kerberos).
Embody and promote Microsoft’s values: Respect, Integrity, Accountability, and Inclusion.