Data Scientist

CreatorIQSan Francisco, CA
1dHybrid

About The Position

As a Data Scientist on our Data Science team, you will play a pivotal role in extracting meaning from the millions of pieces of content generated by creators every day. You will focus primarily on improving robust, scalable classification and language models that categorize content, detect trends, and ensure brand safety. While your core focus will be on foundational Machine Learning and NLP, you will also have the opportunity to experiment with LLM-backed and agentic applications. In this role, you’ll get to:

Requirements

  • NLP Practitioner: You have 2-4 years of experience building and deploying NLP models. You are comfortable with concepts like tokenization, word embeddings, and topic modeling.
  • Machine Learning Foundation: You have a strong grasp of classical ML algorithms (Random Forest, XGBoost, SVM) and know when to use a simple logistic regression versus a deep learning approach.
  • Python Proficiency: You are fluent in Python and its data stack (Pandas, NumPy, Scikit-learn). Experience with libraries like Hugging Face or Spacy is a major plus.
  • Curious about LLMs: While you are grounded in traditional ML, you have a working knowledge of LLM APIs (OpenAI, Anthropic) and prompt engineering, and you are eager to learn how to integrate them into production workflows.
  • Data Driven: You are comfortable writing complex SQL queries to pull your own data and verify your hypotheses.
  • Team Player: You can explain complex technical concepts to non-technical stakeholders and collaborate effectively with MLOps engineers to get your models into production.

Responsibilities

  • Build & Refine Classifiers: Develop and maintain multi-class text classification models to categorize creator content, brand mentions, and sentiment with high precision and recall.
  • Analyze Content Semantics: Utilize Natural Language Processing (NLP) techniques (topic modeling, sentiment analysis, entity extraction) to structure unstructured data from social platforms like TikTok, Instagram, and YouTube.
  • Bridge ML and GenAI: Experiment with Large Language Models (LLMs) to augment training data, perform few-shot classification, or summarize complex creator data, interfacing with the engineering team to bring these concepts to life.
  • Own the Data Lifecycle: Write efficient code to clean, preprocess, and tokenize large text datasets, ensuring high-quality inputs for your models.
  • Measure & Optimize: Work with MLOps to constantly evaluate model performance using standard metrics (F1 score, AUC-ROC) and work to reduce latency for real-time inference needs.
  • Build & Integrate: Collaborate with engineering to integrate measurement loops into our broader infrastructure (AWS/GCP), ensuring our model lifecycle is automated and observable.

Benefits

  • People: work with talented, collaborative, and friendly people who love what they do.
  • Guidance: utilize our learning platform to fully get the training and tools you’ll need to become successful here from your first day with us.
  • Surprise meal stipends: work from home can’t stop the enjoyment of someone else making a meal for you!
  • Work/life harmony: 15 days vacation, floating and set holidays, wellness allowance, and paid parental leave.
  • Whole Health Package: medical, dental, vision, life, disability insurance, and more.
  • Savings: a 401k (USA) plan to help you plan ahead.
  • Work from home stipend: to assist you in setting up a home office that works for you (or buy a new dog leash - your choice!).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service