Apple-posted 7 months ago
Full-time • Mid Level
Austin, TX
Computer and Electronic Product Manufacturing

At Apple, scale and innovation converge to empower world-class engineering. The HMTS Platform team is seeking a seasoned Infrastructure DevOps Engineer with a focus on HPC, Datacenter, and Global Systems. This role involves architecting, deploying, and managing high-performance compute and enterprise infrastructure across Apple's global platforms. The Engineer will bridge high-throughput technical compute with global system integration, supporting thousands of nodes, global data flows, and mission-critical services. As a key member of Apple's Hardware Methodologies, Tool, & Solutions (HMTS) Platform team, you will serve as a vital connector between conventional IT infrastructure development and operations. Your contributions will be crucial in delivering an exceptional design environment for hardware engineering, supporting Apple's commitment to leading innovation in hardware.

  • Design and operate scalable Slurm-based HPC clusters, distributed globally across 2,000+ nodes.
  • Lead infrastructure automation for provisioning, monitoring, and configuration management of compute and enterprise services.
  • Manage and tune high-availability services and support site-aware routing, load balancing, and DNS-based traffic distribution.
  • Serve as expert in Active Directory integrations, trust relationships, replication latency troubleshooting, and directory service hardening.
  • Architect virtual based solutions where appropriate to support auxiliary services, container workloads, and hybrid edge compute nodes.
  • Oversee secure data replication strategies between sites, integrating load-balancer VIPs and geo-distributed failover configurations.
  • Provide root-cause analysis for performance bottlenecks, host instability, or data inconsistencies across global platforms.
  • Work closely with platform owners, security teams, and datacenter engineers to evolve infrastructure towards zero-touch, self-healing architecture.
  • A Bachelor's degree in Computer Science with several years of relevant experience.
  • Proven experience in a DevOps role in an enterprise environment with private and public cloud exposure.
  • Proven experience in a Systems Admin or Systems/IT support role in an enterprise environment.
  • 7+ years of experience operating large-scale, production-grade datacenter or HPC environments (2,000+ nodes).
  • Expert-level Windows Server administration, including Active Directory, GPO, DNS, DHCP, and DFS for distributed enterprise environments.
  • Deep experience with RHEL/CentOS and infrastructure tuning for high-performance, low-latency workloads.
  • Advanced knowledge of global networking concepts: routing, DNS failover, site-aware load balancers, VIP configuration, and traffic shaping.
  • Strong hands-on experience with enterprise virtualization platforms (VMware vSphere/ESXi, HyperV) for production and edge workloads.
  • Proficient in infrastructure automation and scripting with PowerShell, Python, and Ansible.
  • Experience with InfiniBand fabrics and high-bandwidth data interconnects in compute environments.
  • Deep understanding of infrastructure observability using Prometheus, Grafana, Nagios, Splunk, or equivalent tools.
  • Proven success managing global replication services and multi-region compute/data platforms.
  • Excellent cross-functional communication and documentation skills, with the ability to influence and mentor across global teams.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service