About The Position

ByteDance is seeking talented individuals to join the Doubao (Seed) Team, which was founded in 2023 and is dedicated to pioneering advanced AI foundation models. The team focuses on cutting-edge research and aims to drive technological and societal advancements in the field of AI. With a strong commitment to AI, the research areas include deep learning, reinforcement learning, language, vision, audio, AI infrastructure, and AI safety. The team operates across multiple locations, including China, Singapore, and the US, leveraging substantial data and computing resources to develop proprietary general-purpose models with multimodal capabilities. The Doubao models power over 50 ByteDance apps and business lines in the Chinese market and are available to external enterprise clients via Volcano Engine. The Doubao app is recognized as the most widely used AIGC application in China. The Machine Learning (ML) System sub-team focuses on combining system engineering with machine learning to develop and maintain massively distributed ML training and inference systems/services globally. The team is responsible for providing high-performance, highly reliable, and scalable systems for LLM/AIGC/AGI. Team members will have the opportunity to build large-scale heterogeneous systems integrating GPU/NPU/RDMA/Storage, ensuring stability and reliability while enriching their expertise in coding, performance analysis, and distributed systems. Team members will also be involved in decision-making processes and collaborate with a global team from the United States, China, and Singapore.

Responsibilities

  • Participating in online architecture design and optimization centered around deep model inference tasks, achieving high concurrency and throughput in large-scale online systems.
  • Participating in the establishment of a comprehensive system covering stability, disaster recovery, R&D efficiency, and cost, enhancing overall system stability.
  • Participating in the design and implementation of end-to-end online pipeline systems with multiple models, plugins, and storage-computation components, enabling agile, flexible, and observable continuous delivery.
  • Collaborating closely with the MLE for optimization of algorithms and systems.
  • Being proactive, optimistic, highly responsible, and demonstrating meticulous work ethic, as well as possessing strong team communication and collaboration skills.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service