About The Position

We are looking for a Platform & Infrastructure Engineer to join a small, remote team responsible for building and maintaining the tools, processes, networking patterns, and standards that our engineering teams rely on. You will own Azure-based infrastructure — with a heavy emphasis on Azure Kubernetes Service — and work cross-functionally with product engineering teams to introduce changes, define standards, and understand their evolving needs. This is a hands-on technical role for someone who is self-directed, comfortable with ambiguity, and passionate about enabling other engineers to move faster and more safely. You will design and maintain infrastructure that scales, enforces security and compliance patterns, and brings consistency to how engineering is practiced across the organization. You believe the best solution is often the simplest one — you look to reduce complexity before reaching for something new, and you know that taking time to do things right is usually the fastest way forward.

Requirements

  • 4+ years of hands-on experience with Azure infrastructure in a production environment.
  • Deep experience with Azure Kubernetes Service (AKS) — cluster management, networking, scaling, gitops and day-2 operations.
  • Strong understanding of cloud networking, including VNets, NSGs, private endpoints, DNS, and ingress/egress patterns.
  • Experience with infrastructure-as-code — Terraform preferred.
  • Proficiency with CI/CD tooling, particularly GitHub Actions.
  • Comfort working in a small, remote team with a high degree of autonomy and ownership.
  • Strong written and verbal communication skills — able to work cross-functionally, explain technical decisions clearly, and keep stakeholders informed.
  • Security-conscious approach to infrastructure design and operations.
  • Eastern or Central time zone required for team collaboration.

Nice To Haves

  • AKS Operations w/Gitops — experience using ArgoCD
  • Grafana — dashboard creation, alerting and Azure resource monitoring
  • GitHub Actions with Blacksmith and Tailscale for secure and performant CI/CD workflows.
  • Microsoft Entra ID — app registrations, managed identities, conditional access, and RBAC.
  • Terraform — modules, remote state management, and environment-specific configurations.
  • Experience building internal developer platforms or platform engineering tooling.
  • Ability to identify and codify patterns that reduce toil for the broader engineering organization.

Responsibilities

  • Design, build, and maintain Azure-based infrastructure, with a primary focus on Azure Kubernetes Service (AKS) for reliability, scalability, and developer experience.
  • Architect and operate infrastructure to support continuous availability — including zero-downtime deployments, automated rollouts, and the ability to scale capacity up and down in response to predictable demand peaks and quiet periods throughout the year.
  • Own system reliability and maintenance practices, including patching, upgrades, and configuration management across environments, ensuring infrastructure remains healthy, current, and audit-ready.
  • Develop and maintain disaster recovery and business continuity plans — including documented runbooks, tested recovery procedures, rollback strategies, and data recovery protocols that can be executed confidently when needed.
  • Develop and document reusable tools, networking patterns, and infrastructure templates for engineering teams to follow.
  • Collaborate cross-functionally with engineering teams when infrastructure changes are coming, or when working with them to understand what they need.
  • Own and improve CI/CD pipelines using GitHub Actions ensuring fast, reliable, and secure delivery of workflows.
  • Manage infrastructure-as-code using Terraform, enabling repeatable and auditable provisioning across environments.
  • Implement and maintain observability and monitoring solutions, including Grafana dashboards and alerting, to provide teams with clear visibility into system health.
  • Manage identity and access using Microsoft Entra ID, applying least-privilege principles across services and teams.
  • Approach all infrastructure work with a security-first mindset — proactively identifying risks, enforcing compliance patterns, and communicating deviations from standard operating procedures.
  • Communicate clearly with stakeholders and adjacent teams on infrastructure changes, timelines, and dependencies.
  • Contribute to the team's knowledge base by creating runbooks, architecture documentation, and onboarding guides.

Benefits

  • Unlimited Vacation Policy + Sick Time + Holidays
  • Paid Parental Leave
  • Fully Remote Opportunity
  • Healthcare Benefits and 401K
  • Growing Startup to Scale Up Culture
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service