Audio Algorithm Engineer (Speaker Diarization) - San Francisco

Plaud•San Francisco, CA

50d•$150,000 - $180,000•Hybrid

About The Position

About Plaud Inc. Plaud is building the world's most trusted AI work companion for professionals to elevate productivity and performance through note-taking solutions, loved by over 1,500,000 users worldwide since 2023. With a mission to amplify human intelligence, Plaud is building the next-generation intelligence infrastructure and interfaces to capture, extract, and utilize what you say, hear, see, and think. Plaud Inc. is a Delaware-incorporated, San Francisco-based company pushing the boundary of human–AI intelligence through a hardware–software combination. With SOC 2, HIPAA, GDPR, ISO27001, ISO27701, and EN18031 compliance, Plaud is committed to the highest standards of data security and privacy protection. To learn more about Plaud, please visit https://www.Plaud.ai and follow along on Instagram, X, Facebook, LinkedIn, and YouTube Why You Should Join Us Plaud is building the next generation intelligence infrastructure and interfaces to capture, extract, and utilize intelligence from what people say, hear, see, and think. Plaud is a bootstrapped, skyrocketing, profitable company with a $250M revenue run rate achieved in just three years. Define the next-gen paradigm for human-AI interaction. Gain exposure to cutting-edge AI for Pro tools and play a direct role in our global expansion. Work with passionate teammates who value innovation, collaboration, and customer success. Grow your career in a culture that champions continuous learning and fast career development. Market-competitive compensation, global exposure, and a vibrant, creativity-fueled work atmosphere.

Requirements

3 to 5 years of speech algorithm training experience, with experience in fine-tuning and training SpeechLLM.
Experience processing hundreds of thousands of hours of speech data and training speech recognition models.
Familiar with SpeechLLM, speech SSL training, with from-scratch training experience for models similar to StepAudio, Qwen3omni, etc. Individual contributors responsible for model training within teams like the StepAudio speech group are preferred.
Papers in top speech conferences like Interspeech, ICASSP, or patents related to speech.

Responsibilities

For the multi-language ASR system, research optimization solutions for terminology thesaurus from papers, and design reasonable terminology filtering and hotword optimization solutions.
Implement multi-language hotword algorithms based on SpeechLLM and optimize their effects; collaborate with the engineering team to deploy the hotword recognition solution.
Combine scenario data to fine-tune the speech recognition model and improve ASR recognition effects across multiple languages and industries.
Build a test set and system for keyword recognition and industry recognition engines, and evaluate the terminology recognition and industry engine effects of open-source models and commercial interfaces.

Benefits

Founding Team: Opportunity to join the founding team of this new initiative, with meaningful ownership and impact on a fast-growing startup.
Competitive Compensation: $150K-$180K base salary+performance bonus+Equity.
Comprehensive Benefits: Top-tier healthcare for employees and dependents, including dental and vision, and a generous employer subsidy.
Retirement Planning: 401(k) plan for full time employees with company matching.
Paid Time Off: Unlimited PTO, plus 13 paid holidays.
New Parent Leave: 12 weeks of paid time off to spend time with your new family, regardless of gender.
Hybrid Office: Minimum of 3x in office per week.
Gear: New hires are equipped with their choice of new top-of-the-line laptops and workstation setups.
Perks: Best office equipment. Annual offsites. Free office drinks and snacks.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume