Business Unit What the Role Entails Job Responsibilities: 1. Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities. 2. Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision. 3. Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications. Who We Look For Job Requirements: 1. Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields. 2. Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec. 3. Familiarity with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila). Prior project experience is preferred. 4. Proficient in deep learning frameworks (e.g., PyTorch). Experience with large-scale model training frameworks (Megatron/Deepspeed) is a plus. 5. Solid understanding of large model architectures and principles. Experience in large-scale pretraining or post-training is preferred.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees