About The Position

A Senior Site Reliability Engineer (SRE) is expected to own the operational stability and performance of Juul’s hybrid cloud infrastructure (Nutanix, AWS/GCP). This involves leading automation efforts, architecting for reliability, and acting as the final escalation point for critical incidents to ensure the platform is scalable and efficient.

Requirements

  • 8-12+ years infrastructure experience with 8+ years in Nutanix HCI and enterprise cloud AWS/GCP)
  • Expert-level skills in Python, PowerShell, Bash scripting, infrastructure-as-code (Terraform/CloudFormation), and container orchestration (Kubernetes, EKS/GKE)
  • Proven experience managing enterprise-scale environments, hybrid cloud migrations, disaster recovery, and L3 critical incident management
  • Strong networking knowledge (TCP/IP, VLANs, routing, VPN), security hardening, and compliance frameworks (ITIL)
  • Strategic thinker with exceptional analytical and troubleshooting abilities for complex multi-layer infrastructure issues
  • Excellent communication skills to translate technical concepts to executives and non-technical stakeholders
  • Calm under pressure during critical outages with meticulous attention to security, compliance, and configuration management
  • Self-motivated continuous learner committed to staying current with evolving cloud technologies and automation opportunities
  • Available for on-call rotations with strong documentation skills and customer service orientation

Nice To Haves

  • Nutanix NCP/NCAP
  • AWS Solutions Architect Professional
  • AWS DevOps Professional
  • GCP Professional Cloud Architect
  • Terraform

Responsibilities

  • Design, deploy, and maintain enterprise-scale Nutanix AHV clusters and Prism Central for multi-cluster management
  • Expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations, troubleshooting, and automation
  • Develop automation scripts using Nutanix REST APIs, Python SDK, PowerShell, and Terraform for infrastructure-as-code
  • Create and manage VM templates, golden images, and standardized deployment catalogs for consistent provisioning
  • Design disaster recovery solutions using Leap, Protection Domains, cross-cluster replication, and metro clustering
  • Implement network micro-segmentation using Nutanix Flow and configure RBAC, encryption, and security hardening
  • Lead L3 troubleshooting using advanced diagnostics, log analysis (CVM, Genesis), NCC health checks, and cluster service resolution
  • Configure high availability, VM affinity rules, QoS policies, and optimize performance for mission-critical workloads
  • Manage AHV networking with OVS bridges, VLANs, bonds, LACP and implement resource reservations and workload balance.
  • Design, deploy, and maintain hybrid cloud infrastructure across Nutanix HCI, AWS, and GCP platforms
  • Architect and implement multi-cloud solutions ensuring high availability, scalability, and disaster recovery
  • Architect and deploy enterprise-scale, highly available multi-cloud solutions across AWS and GCP with multi-region/multi-account strategies
  • Expert-level proficiency with AWS CLI, GCP CLI, SDK, boto3, and Python for advanced automation and infrastructure orchestration
  • Design AWS Organizations and GCP Organization hierarchies with consolidated billing, IAM policies, and centralized governance
  • Configure and manage AWS Systems Manager (SSM) including Session Manager, Run Command, State Manager, and Automation for centralized fleet operations
  • Implement centralized logging using CloudWatch/CloudTrail and GCP Cloud Logging with S3/Cloud Storage aggregation
  • Integrate AWS and GCP with Splunk using HEC, CloudWatch subscriptions, Pub/Sub, Dataflow, and cloud-specific add-ons for SIEM correlation
  • Design and deploy advanced load balancing solutions with AWS ALB/NLB/ELB and GCP Cloud Load Balancing including SSL termination and auto-scaling
  • Develop infrastructure-as-code using Terraform, CloudFormation, CDK for repeatable multi-cloud deployments and CI/CD pipelines
  • Configure AWS SSO, cross-account IAM roles, GCP Workload Identity, and federated access for centralized identity management
  • Design VPC architectures with AWS Transit Gateway/PrivateLink and GCP Shared VPC/VPC peering for hybrid connectivity
  • Manage containerized workloads using EKS, GKE, ECS, Cloud Run with service mesh, observability, and security best practices
  • Implement disaster recovery using AWS Backup, Cross-Region Replication, GCP snapshots, and multi-region failover strategies
  • Lead L3 troubleshooting using CloudWatch Insights, GCP Cloud Trace, VPC Flow Logs, X-Ray, and vendor support escalation
  • Perform cost optimization through Reserved Instances, Committed Use Discounts, rightsizing, and automated resource lifecycle management
  • Administer and support Windows Server and Unix/Linux environments in production and non-production settings
  • Perform OS-level hardening, patch management, and security compliance across heterogeneous systems
  • Automate routine administrative tasks using PowerShell, Bash, Python, or similar scripting languages
  • Manage GitHub organization settings, user permissions, repository access controls, and monitor GitHub Actions workflows and repository health across multiple teams
  • Configure Splunk forwarders, heavy forwarders and other integrations for data ingestion from cloud and on-premises sources

Benefits

  • Equity and performance bonuses
  • Cell phone subsidy
  • Commuter benefits
  • Discounts on JUUL products
  • Excellent medical, dental and vision
  • Disability insurance
  • Life insurance
  • Family support
  • Wellness programs
  • Legal assistance
  • Employee assistance program
  • 401(k) plan with company matching
  • Biannual discretionary performance bonuses
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service