Infrastructure DevOps Engineer

AppleAustin, TX
276d

About The Position

At Apple, scale and innovation converge to empower world-class engineering. The HMTS Platform team is seeking a seasoned Infrastructure DevOps Engineer with a focus on HPC, Datacenter, and Global Systems. This role involves architecting, deploying, and managing high-performance compute and enterprise infrastructure across Apple's global platforms. The Engineer will bridge high-throughput technical compute with global system integration, supporting thousands of nodes, global data flows, and mission-critical services. As a key member of Apple's Hardware Methodologies, Tool, & Solutions (HMTS) Platform team, you will serve as a vital connector between conventional IT infrastructure development and operations. Your contributions will be crucial in delivering an exceptional design environment for hardware engineering, supporting Apple's commitment to leading innovation in hardware.

Requirements

  • A Bachelor's degree in Computer Science with several years of relevant experience.
  • Proven experience in a DevOps role in an enterprise environment with private and public cloud exposure.
  • Proven experience in a Systems Admin or Systems/IT support role in an enterprise environment.

Nice To Haves

  • 7+ years of experience operating large-scale, production-grade datacenter or HPC environments (2,000+ nodes).
  • Expert-level Windows Server administration, including Active Directory, GPO, DNS, DHCP, and DFS for distributed enterprise environments.
  • Deep experience with RHEL/CentOS and infrastructure tuning for high-performance, low-latency workloads.
  • Advanced knowledge of global networking concepts: routing, DNS failover, site-aware load balancers, VIP configuration, and traffic shaping.
  • Strong hands-on experience with enterprise virtualization platforms (VMware vSphere/ESXi, HyperV) for production and edge workloads.
  • Proficient in infrastructure automation and scripting with PowerShell, Python, and Ansible.
  • Experience with InfiniBand fabrics and high-bandwidth data interconnects in compute environments.
  • Deep understanding of infrastructure observability using Prometheus, Grafana, Nagios, Splunk, or equivalent tools.
  • Proven success managing global replication services and multi-region compute/data platforms.
  • Excellent cross-functional communication and documentation skills, with the ability to influence and mentor across global teams.

Responsibilities

  • Design and operate scalable Slurm-based HPC clusters, distributed globally across 2,000+ nodes.
  • Lead infrastructure automation for provisioning, monitoring, and configuration management of compute and enterprise services.
  • Manage and tune high-availability services and support site-aware routing, load balancing, and DNS-based traffic distribution.
  • Serve as expert in Active Directory integrations, trust relationships, replication latency troubleshooting, and directory service hardening.
  • Architect virtual based solutions where appropriate to support auxiliary services, container workloads, and hybrid edge compute nodes.
  • Oversee secure data replication strategies between sites, integrating load-balancer VIPs and geo-distributed failover configurations.
  • Provide root-cause analysis for performance bottlenecks, host instability, or data inconsistencies across global platforms.
  • Work closely with platform owners, security teams, and datacenter engineers to evolve infrastructure towards zero-touch, self-healing architecture.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Computer and Electronic Product Manufacturing

Education Level

Bachelor's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service