Director, Software Engineering

ResMedSan Diego, CA
Hybrid

About The Position

At Resmed, we are pioneering the future of digital healthcare by delivering reliable, performant, and resilient systems that serve millions of patients, providers, and partners worldwide. We are seeking a Principal Software Engineer, Production Engineering to lead the reliability, scalability, and operational excellence of our production systems. This role is for a deeply technical expert who thrives in high-scale distributed environments and takes ownership of ensuring systems run flawlessly in production. You will operate at the intersection of software engineering and site reliability engineering (SRE), driving system stability, diagnosing complex issues, and building the processes and tooling that enable engineering teams to deliver resilient, high-performing products. In this deeply hands-on technical leadership role, you will partner with SRE and engineering teams across the globe to elevate production readiness, and operability. If you are passionate about keeping complex systems running at scale, mentoring engineers to think production-first, and driving a culture of engineering rigor and continuous improvement, this role is for you.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or related field.
  • 10+ years of experience in software engineering, with significant focus on production systems and reliability
  • Proven expertise in debugging complex distributed systems at scale
  • Deep understanding of microservices architectures and cloud-native systems
  • Strong experience with AWS and Azure, as well as on-premise environments
  • Solid understanding of networking fundamentals (TCP/IP, DNS, load balancing, proxies, etc.)
  • Experience with observability tools (e.g. Datadog)
  • Strong programming skills in one or more languages (e.g., Java, C#, Go, Python)
  • Experience leading incident response and root cause analysis in production environments
  • Experience in healthcare, medical devices, or regulated industries where quality systems, data privacy, and compliance are not optional

Nice To Haves

  • Strategic Thinker – Aligns technology and engineering outcomes with ResMed’s long-term vision and business goals.
  • Innovator – Embraces new ideas and fosters experimentation to accelerate digital transformation.
  • Technical Proficiency – Demonstrates deep expertise in modern software engineering, architecture, and cloud-native development.
  • Problem Solver – Solves the hardest problems at scale.
  • System Thinker – Understands how complex systems interact and behave.
  • Build Relationships (Collaboration): Develop strong partnerships across teams to achieve shared outcomes.
  • Develop People: Empower and enable others to perform at their best.
  • Lead Change: Drive purposeful transformation with clarity and confidence.
  • Think Critically: Apply evidence-based reasoning to solve complex challenges.
  • Communicate Clearly: Share insights transparently, concisely, and with purpose.
  • Create Accountability: Establish clear expectations and ownership for results.

Responsibilities

  • Act as a technical expert for production systems, focusing on reliability, performance, and scalability
  • Lead deep debugging and root cause analysis of complex issues in distributed systems
  • Partner with SRE & engineering teams to diagnose and resolve production incidents, reducing customer impact
  • Contribute to and improve incident response, escalation, and postmortem practices
  • Guide teams in defining and applying SLIs, SLOs, and error budgets
  • Provide technical leadership in designing and reviewing large-scale distributed systems and microservices architectures
  • Identify systemic risks, bottlenecks, and failure modes, and drive improvements in system resilience
  • Collaborate with engineering teams to ensure systems are designed for operability and production readiness
  • Design and implement improvements to observability (logging, metrics, tracing)
  • Build and advocate for tools that enhance debugging, monitoring, and operational insight
  • Reduce operational toil through automation and better tooling
  • Provide deep expertise across AWS, Azure, and on-premise environments
  • Support teams in optimizing infrastructure for scalability, reliability, and cost efficiency
  • Influence best practices in deployment, release, and rollback strategies
  • Mentor and coach engineers on production engineering and reliability practices
  • Lead by example through hands-on problem solving in critical situations
  • Raise the overall engineering bar through knowledge sharing and technical guidance
  • Champion AI transformation - evolving how your teams build software, not just what they build. This means advocating for AI-assisted development, agentic workflows, and the composed product engineer model where small teams deliver outsized impact
  • Contribute to defining and evolving production engineering and SRE best practices
  • Drive adoption of consistent operational standards and practices across teams
  • Promote a culture of continuous improvement through learning and systemic fixes

Benefits

  • comprehensive medical, vision, dental, and life, AD&D, short-term and long-term disability insurance, sleep care management, Health Savings Account (HSA), Flexible Spending Account (FSA), commuter benefits, 401(k), Employee Stock Purchase Plan (ESPP), Employee Assistance Program (EAP), and tuition assistance.
  • Employees accrue three weeks Paid Time Off (PTO) in their first year of employment
  • receive 11 paid holidays plus 3 floating days
  • eligible for 14 weeks of primary caregiver or two weeks of secondary caregiver leave when welcoming new family members.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service