About The Position

In Apple’s iCloud services organization, efficiency is not simply a technical objective—it is a fundamental part of how we deliver reliable, scalable, and sustainable infrastructure for billions of users worldwide. The iCloud Efficiency team is responsible for improving how Apple’s cloud services utilize compute, storage, and operational resources at massive scale. As infrastructure complexity grows, the opportunity to apply Generative AI, intelligent automation, and agentic systems becomes increasingly critical to accelerating operational excellence, improving engineering productivity, and optimizing resource efficiency. As a Senior iCloud Efficiency Engineer focused on GenAI and Agentic Systems, you will work at the intersection of large-scale systems engineering, infrastructure automation, AI-assisted operations, and intelligent decision systems. You will provide technical leadership for streamlining GenAI efforts across the organization: establishing reusable patterns, defining production standards, and helping teams converge on durable, safe, and measurable AI-assisted infrastructure workflows. You will apply production state-of-the-art LLM systems, retrieval-assisted generation (RAG), skills-based automation, agentic workflows, evaluation and orchestration frameworks to transform how engineering teams operate, troubleshoot, forecast, and optimize cloud infrastructure. This role involves partnering closely with data engineering, data science, infrastructure engineering, software reliability engineering and finance teams to design and deploy AI-driven systems that improve efficiency across capacity planning, anomaly detection, operational workflows, deployment safety, and infrastructure optimization. Your work will directly influence the operational and financial efficiency of one of the world’s largest private cloud environments supporting iCloud, Apple Intelligence, and Private Cloud Compute (PCC). The Senior iCloud Efficiency Engineer will play a critical role in advancing Apple’s next generation of intelligent infrastructure operations through applied GenAI and agentic technologies. This role focuses on building practical, high-impact AI systems that improve engineering workflows and infrastructure decision-making. You will identify high-leverage operational problems, set architecture direction, design agentic solutions, and guide teams from prototype to production adoption. The goal is combining LLM reasoning, system context, automation frameworks, and engineering safeguards to improve speed, reliability, and efficiency. Success in this role will be measured by concrete outcomes: adoption of shared patterns and tools by multiple teams, measurable toil reduction, validated cost or capacity savings. You will help define how AI can safely and effectively augment engineering teams—from capacity optimization and deployment analysis to incident response, forecasting, and infrastructure planning.

Requirements

  • 5+ years of experience in software engineering, infrastructure engineering, or large-scale cloud services environments
  • Proven experience designing, building, or technically leading production GenAI, ML platform, developer productivity, infrastructure automation, or tooling systems
  • Hands-on experience with GenAI technologies, LLM application architecture, including retrieval, context engineering, tool use, workflow orchestration, agentic workflows, evaluation, observability, and failure handling
  • Demonstrated technical leadership across teams, including architecture reviews, roadmap influence, mentoring, and driving adoption of shared engineering practices
  • Strong understanding of cloud infrastructure operations, observability, deployment systems, and operational safety principles
  • Proven ability to translate ambiguous operational challenges into practical engineering solutions with measurable business impact
  • Strong software development skills in Python, Java, or similar languages
  • Exceptional analytical, systems thinking, and cross-functional communication skills
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field

Nice To Haves

  • Experience applying GenAI to infrastructure operations, SRE workflows, capacity planning, or engineering productivity systems
  • Experience building AI systems with operational guardrails, governance models, and safe deployment patterns for enterprise environments
  • Strong understanding of capacity forecasting, cost optimization, and infrastructure efficiency modeling at hyperscale
  • Background working in private cloud environments, large-scale storage systems, or global distributed infrastructure
  • PhD or advanced degree in Computer Science, Machine Learning, Distributed Systems, or related field

Responsibilities

  • Provide technical leadership for streamlining GenAI efforts across the organization: establishing reusable patterns, defining production standards, and helping teams converge on durable, safe, and measurable AI-assisted infrastructure workflows.
  • Apply production state-of-the-art LLM systems, retrieval-assisted generation (RAG), skills-based automation, agentic workflows, evaluation and orchestration frameworks to transform how engineering teams operate, troubleshoot, forecast, and optimize cloud infrastructure.
  • Partner closely with data engineering, data science, infrastructure engineering, software reliability engineering and finance teams to design and deploy AI-driven systems that improve efficiency across capacity planning, anomaly detection, operational workflows, deployment safety, and infrastructure optimization.
  • Build practical, high-impact AI systems that improve engineering workflows and infrastructure decision-making.
  • Identify high-leverage operational problems, set architecture direction, design agentic solutions, and guide teams from prototype to production adoption.
  • Help define how AI can safely and effectively augment engineering teams—from capacity optimization and deployment analysis to incident response, forecasting, and infrastructure planning.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service