Data Scientist II

Scribd•San Francisco, CA

3d•Hybrid

About The Position

The Applied Research team is a group of data scientists and content specialists who are experts in leveraging machine learning, natural language processing and generative AI models to develop solutions which deliver value to our users and business. We act as a key driver for innovation, whether it’s in product surface experimentation, metadata generation or model development. Along with Product and Engineering partners, we design solutions and collaborate in cross-functional squads to maximize business impact. Our areas of impact include content enrichment, representation learning, recommendations, search, translation and many others, applied to diverse media across text, image, and audio. We operate at a scale of hundreds of millions of documents, millions of users and billions of user interactions. We are seeking a Data Scientist II with experience developing and deploying machine learning models. You will help design and implement high impact AI and ML systems. We work in cross-functional teams collaborating with Machine Learning Engineers, Data Engineers and Product. We are seeking a curious and collaborative individual with an eye for simplicity, end-end visibility and impact and that is excited about building models using massive amounts of data, using language models and deploying models.

Requirements

3+ years of post qualification experience developing machine learning models, working with systems at scale and deploying to production environments.
Proficiency in Python.
Hands-on experience building ML pipelines and working with distributed data processing frameworks like Apache Spark, Databricks, or similar.
Intermediate level in at least three of these fields: classification algorithms, natural language processing, search, information retrieval, named entity recognition, deep learning, generative models.
Intermediate level or greater experience with SQL or PySpark.
Bachelors or Masters in relevant quantitative discipline including but not limited to Statistics, Computer Science, Data Science, Artificial Intelligence or another field with a strong quantitative focus.

Responsibilities

Focus on a variety of content classification use cases, leveraging everything from traditional NLP to sophisticated LLMs and generative models
Investigate methods of solving our most challenging problems at Scribd, at scale
Collaborate with other Data Scientists, Machine Learning Engineers and ML Data Engineers on cross-functional projects
Leverage any algorithm at your disposal: from classical Scikit-learn and NumPy models to custom Neural Networks in PyTorch to third party LLM APIs
Process massive amounts of data with Python, SQL and Spark
Align with stakeholders through written and verbal communications methods on the approaches and results of projects, while writing detailed, accurate and concise project documentation

Benefits

Scribd Flex (flexible work model)
Comprehensive health, dental, and vision coverage
Mental health support and disability coverage
Generous paid time off, including vacation, sick time, holidays, winter break, volunteer time, and sabbaticals
Paid parental leave and family support benefits
Retirement matching and employee equity
Learning and development programs and professional growth opportunities
Wellness and home office stipends
Complimentary access to the Scribd, Inc. suite of products
Enterprise access to leading AI tools