Principal Software Automation Engineer

MicrosoftRaleigh, NC
20h

About The Position

Microsoft Silicon Cloud Hardware Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Skype, OneDrive and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide We are seeking a Principal Software Automation Engineer to define, scale, and govern automation standards across HPC infrastructure, operational services, and Azure-connected platforms. This is a high-impact player/coach role responsible for setting the technical vision of the Automation Center of Excellence (CoE), serving as the organization’s escalation point for complex automation challenges, and growing a high-performing automation team as scope and funding mature. The role combines hands-on delivery with strategic leadership to reduce operational toil, improve reliability, and accelerate delivery across on-prem HPC and cloud-integrated environments supporting Silicon Development.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • 10+ years building production software, automation platforms, or infrastructure tooling
  • 7+ years of experience in a technical lead role.
  • Problem-solving skills and a collaborative attitude.
  • Exceptional oral and written communication skills with a proven ability to present complex technical information to leadership
  • Team player, collaboration skills, and positive attitude
  • Coding skills (e.g., Python, Go, C#, Java) with testing and CI/CD rigor
  • #SCHIE

Responsibilities

  • CoE Leadership & Technical Authority: Own the end-to-end automation strategy for HPC, operational platforms, and Azure integrations. Define reference architectures, standards, and coding methodologies. Serve as the highest-level technical escalation point for automation, reliability, and integration challenges across the org.
  • Roadmaps & Standards: Create and maintain multi-year automation roadmaps aligned to business priorities. Establish coding standards, testing strategies, code quality, security baselines, and operational readiness criteria adopted across teams.
  • Team Leadership: Build, mentor, and technically lead a software automation team over time. Set hiring bar, role definitions, and career paths; coach senior engineers; lead by example through hands-on contributions.
  • Hands-on Engineering (Principal IC): Architect, design, implement, and operate production-grade automation platforms for HPC infrastructure and cloud services.
  • Operational Automation at Scale: Eliminate manual and error-prone work by codifying provisioning, imaging, patching, validation, break/fix, incident response, and self-healing remediation workflows.
  • Platform & Service Integrations: Design robust API-first, event-driven, and asynchronous integrations across internal platforms for HPC services, and Azure-native services.
  • ETL & Data Engineering: Build and evolve data pipelines that ingest, transform, and validate telemetry, logs, metrics, and operational signals. Enable reliability analysis, capacity forecasting, cost optimization, and executive reporting.
  • Azure Automation & Governance: Lead infrastructure-as-code, CI/CD pipelines, identity and access automation (RBAC), policy enforcement, secrets management, and monitoring with security-by-default and compliance-aware practices.
  • Reliability & Observability: Define SLOs/SLIs for critical services; standardize logging, metrics, and tracing; implement automated detection, alerting, and recovery to improve availability and reduce MTTR.
  • Cross-Org Influence: Partner with infrastructure, Cloud, CAD, security teams to align priorities, unblock dependencies, and drive adoption of CoE standards and platforms.
  • Technical Reviews & Decision Making: Lead architecture and design reviews, assess trade-offs, and make durable technical decisions that balance reliability, velocity, cost, and risk.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service