Applied Machine Learning Research Engineer - Multimodal for Human Understanding

Apple Inc.•Sunnyvale, CA

33d

About The Position

We're starting to see the incredible potential of multimodal foundation and large language models, and many applications in the computer vision and machine learning domain that previously appeared infeasible are now within reach. We are looking for a highly motivated and skilled Applied Machine Learning Research Engineer to join our team in the Video Computer Vision group and help us push the boundaries of human understanding. The Video Computer Vision org has pioneered human-centric real-time features such as FaceID, FaceKit, and Gaze and Hand gesture control which have changed the way millions of users interact with their devices. We balance research and product requirements to deliver Apple quality, pioneering experiences, innovating through the full stack, and partnering with HW, SW and AI teams to shape Apple's products and bring our vision to life.In this role, you will drive ground breaking development at the intersection of AI, generative modeling, and computer vision. You will work across the full lifecycle-from foundational investigation to practical applications-designing, implementing, and evaluating novel algorithms and models. Your primary focus will be human understanding, including human motion, activities, and representation learning. A major aspect of the role involves designing, implementing, evaluating and productizing ML systems capable of human and activity understanding. This position offers a unique opportunity to innovate, build, and ship: you will take your conceptual ideas to products that reach millions of users worldwide. You will collaborate with a diverse group of experts-research scientists, ML engineers, software engineers, data scientists, human-interface designers, and domain specialists-working in an environment that values experimentation, ownership, and continuous learning. By staying at the forefront of advancements in AI, machine learning, and computer vision, you will play a direct role in driving innovation, influencing the evolution of Apple products, and meaningfully enhancing user experience on a global scale.

Requirements

Hands-on experience training and deploying production-grade ML models.
Experience developing multimodal LLMs or generative models.
Production-level experience with a compiled language (e.g., Swift, C++).
Expertise in one or more areas: computer vision, machine learning, multimodal LLMs, Reinforcement Learning, Agentic AI.
PhD in Computer Science, Electrical Engineering, or a related field with a focus on computer vision, machine learning, or multimodal systems.
Demonstrated problem-solving ability, strong sense of ownership and product shipment.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume