Full Stack AI Native Claude Engineer

Apple•San Diego, CA

22h•Onsite

About The Position

Our team owns the end-to-end platform behind stability analysis at Apple: symbolication of crash logs across the company's hardware portfolio, the data pipelines that aggregate and cluster crash logs, and the applications and services that engineers across Apple use every day to drive operating-system quality. This role is about keeping that platform healthy, extending it deliberately, and making the engineering team itself more effective by using AI tools well. Day to day, you'll spend most of your time on the engineering work of running real systems: tuning evaluation infrastructure, tightening operational controls, improving auditability and debug trails, and scaling the workflows our analysts rely on. When new capabilities are needed, you'll prototype and integrate them into the platform. You'll partner closely with stability analysts who are domain experts in OS reliability, and with the broader team responsible for symbolication, ETL, and service infrastructure. You'll also be expected to use AI-assisted development tools fluently to investigate issues, refactor at scale, and ship more with a small team. We're looking for someone with the rigor of a seasoned production engineer who is also comfortable operating systems that include LLMs and agents as first-class components. If you enjoy taking responsibility for a complex, already-running platform and making it steadily better, we want to talk.

Requirements

5+ years of professional software engineering experience building and operating production systems
BS in Computer Science or a related field, or equivalent practical experience
Fluent use of AI-assisted development tools (coding agents, code review assistants, etc.) to work effectively at scale
Demonstrated experience designing and scaling distributed systems (load balancing, active-active topologies, capacity planning, throughput-bound services)
Track record of maintaining and evolving production services — observability, operational controls, incident response, and steady iteration on existing systems
Strong full-stack instincts; comfortable spanning data infrastructure, backend services, and the user-facing surfaces that consume them
Proven ability to operate independently on ambiguous, open-ended problems where the right answer is not obvious

Nice To Haves

Experience operating LLM- or agent-based features in production environments over time
Experience building or maintaining evaluation harnesses, audit trails, or replay infrastructure for AI systems
Background in developer tools, observability, crash/stability analysis, or other operating-system-quality domains
Familiarity with one or more of: Ruby on Rails, Node.js/TypeScript, Python for production services
Experience working in environments with significant deferred scalability work (capacity-constrained, long-lead-time infrastructure)

Responsibilities

Keep large, AI-augmented systems running reliably at Apple scale.
Build and operate platforms, services, and infrastructure that turn crash reports from Apple devices into actionable engineering insights.
Work on systems where LLMs and agents are already part of the production fabric — evolving them, hardening them, and using AI tools to extend what a small team can deliver.
Keep the stability analysis platform healthy and extend it deliberately.
Make the engineering team itself more effective by using AI tools well.
Tune evaluation infrastructure.
Tighten operational controls.
Improve auditability and debug trails.
Scale the workflows our analysts rely on.
Prototype and integrate new capabilities into the platform.
Partner closely with stability analysts and the broader team responsible for symbolication, ETL, and service infrastructure.
Use AI-assisted development tools fluently to investigate issues, refactor at scale, and ship more with a small team.
Operate systems that include LLMs and agents as first-class components.
Take responsibility for a complex, already-running platform and make it steadily better.