About The Position

Are you excited about the impact that optimizing deep learning models can have on enabling transformative user experiences? The field of ML compression research continues to grow rapidly and new techniques to perform quantization, pruning etc are increasingly available to be ported and adopted by the ML developer community looking to ship more models in a constrained memory budget and make them run faster. We are passionate about productizing and pushing the envelope of the state of the art model optimization algorithms, to further compress and speed up the thousands of deep learning models shipping as part of Apple internal and external apps, running locally on millions of Apple devices. We work on a python library that implements a variety of training time and post training quantization algorithms and provides them to developers as simple to use, turnkey APIs, and ensures that these optimizations work seamlessly with the Core ML inference stack and Apple hardware. We are a team that collaborates heavily with researchers at Apple, ML software and hardware architecture teams and external/internal product teams shipping state of the art optimized models on Apple devices. If you are excited about making a big impact and playing a critical role in the design and development of a relatively new model optimization library, this is a great opportunity for you. We are looking for someone who is highly self motivated and passionate about optimizing models for on device execution. If you have a proven track record of applied deep learning research in model compression, writing high quality code and shipping software, we strongly encourage you to apply. We work on developing, prototyping and productizing state of the art algorithms for neural network model compression. Our algorithms are implemented using PyTorch and optimizations are geared towards efficient deployment via Core ML. We optimize models across domains, including NLP, vision, text and image generative models etc. Our APIs are available to Core ML users, both internal to Apple and external developers via the Core ML Tools optimization sub module.

Requirements

  • Demonstrated ability to design user friendly and maintainable APIs.
  • A deep understanding in the research area of model compression and quantization techniques.
  • Experience in training, fine tuning, and optimizing neural network models.
  • Primary contributor to a model optimization/compression library.
  • Good communication skills, including ability to communicate with cross-functional audiences.

Responsibilities

  • Implement latest algorithms from research papers for model compression in the optimization library.
  • Apply these to the models critical for deployment and test on various architectures such as diffusion models, large language models etc.
  • Set up and debug training jobs, datasets, evaluation, performance benchmarking pipelines.
  • Apply training time and post training compression techniques.
  • Ability to ramp up quickly on new training code bases and run experiments.
  • Understand HW capabilities and incorporate those in optimization algorithm design / enhancement.
  • Keep up with the latest AI research and present recent papers in the field of model compression to the team.
  • Collaborate with researchers, hardware and software engineers to co-develop and discover ideas and optimizations for critical models to be deployed on specific hardware.
  • Run detailed experiments and ablation studies to profile algorithms on various models, tasks, across different model sizes.
  • Improve model optimization documentation, writing tutorials and guides.
  • Self prioritize and adjust to changing priorities and asks.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Mid Level

Industry

Computer and Electronic Product Manufacturing

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service