Systems Architect, Retail and Marcom Engineering

Apple•Austin, TX

About The Position

Marcom Engineering, a globally recognized engineering team, ensures seamless global communications across various media and platforms. Our products and services interact with hundreds of millions of Apple customers daily, enabling us to drive strategic marketing experiences. We’re committed to continuous learning and delivering global solutions. By collaborating with diverse teams, we combine expertise to create interactive experiences with talented software engineers. As a Systems Architect, you’ll set technical direction for cloud infrastructure, delivery platforms, and operational excellence programs, influencing how we build, ship, and scale digital experiences defining the Apple brand. Your decisions impact organizations, accelerate delivery for engineers, and raise the ceiling on Marcom Engineering’s technology. In an AI-driven world, you’ll lead the integration of intelligent automation, AI-assisted operations, and LLM-powered developer tooling into our engineering processes.

Requirements

Bachelor’s degree in Computer Science, Software Engineering or a related field or equivalent practical experience.
12 years of hands-on experience in infrastructure, DevOps, platform, or software engineering, with at least 3 years in a senior role.
Expertise in cloud platforms (AWS, GCP), including network topology, identity and access management, cost governance, and multi-account strategy.
Proficiency in containerization and orchestration (Docker, Kubernetes, Helm, Kustomize, service mesh).
Proficiency in infrastructure-as-code (Terraform, Pulumi, Ansible), configuration management, state management, modularity, and GitOps.
Experience designing and operating CI/CD systems (Jenkins, Spinnaker, ArgoCD, GitHub Actions) and creating pipelines for large teams.
Proficiency in at least two systems programming language (Python, Go, Java) for tooling and automation.
Verbal and written communication skills for presenting complex architectural trade-offs to engineering and executive audiences.

Nice To Haves

15+ years of experience in infrastructure or platform engineering, especially in fast-paced, large-scale consumer-facing technology environments.
Experience architecting end-to-end MLOps platforms, including model registries, experiment tracking, automated retraining pipelines, A/B testing infrastructure, and production model observability.
Expertise in LLM infrastructure, including hosting, fine-tuning large language models, RAG pipelines, MCP server creation and integration, vector databases, and prompt engineering at scale.
Experience implementing AIOps solutions that automate or augment on-call operations, including predictive alerting, automated root cause analysis, self-healing runbooks, and capacity forecasting.
Familiarity with AI safety and governance, including model drift detection, bias monitoring, explainability, and audit trails.
Understanding of FinOps principles applied to AI workloads, including GPU cost optimization, spot instance strategies, and inference cost modeling.
Experience building internal developer platforms with AI-assisted features like natural language queries, AI-generated runbooks, and LLM-augmented incident postmortems.
Experience in platform modernization, including bare-metal to cloud migrations, monolith decomposition, and legacy CI/CD re-platforming.
Experience with edge computing, CDN architecture, and globally distributed cache and content delivery strategies for large-scale web properties.
Hands-on experience in chaos engineering and advanced reliability practices, including failure injection, game days, capacity modeling, and traffic shaping.
Records of publishing architecture decisions, internal white papers, or cross-org RFCs that influenced platform direction.
Contributions to open-source infrastructure or AI/ML tooling projects, or active participation in DevOps and AI engineering communities.
Grasp of application and infrastructure security (zero-trust, secrets management, vulnerability management, compliance frameworks).
Cloud/DevOps and AI Certification/s

Responsibilities

Define and own the multi-year technical roadmap for DevOps infrastructure, CI/CD platforms, and cloud architecture.
Establish architecture standards, reference designs, and platform blueprints for team adoption.
Lead infrastructure tooling, cloud services, and AI-driven automation framework adoption, considering trade-offs, cost, and operational impact.
Implement company-wide initiatives to boost developer productivity, expedite deployment, and enhance platform reliability, with clear goals and progress tracking.
Oversee major infrastructure migrations, platform consolidations, and re-architectures.
Collaborate with SRE, Security, and Compliance to design the platform for reliability, observability, and security.
Find and fix software development issues, including reliability, scalability, and ease of development, leading your team in solving these problems.
Develop and execute platform strategies for AI/ML workload delivery, including model training infrastructure, LLM inference serving, feature pipelines, and AIOps integration.
Mentor senior and staff engineers, conduct architecture reviews, provide design feedback, and improve technical standards.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume