English Data Collection Project – Real-World Content

Summa Linguae TechnologiesSan Diego, CA
Remote

About The Position

We are seeking qualified contributors and data collection vendors to support a large-scale multilingual real-world data collection project. This is a paid opportunity (payment per asset), and you may use your existing, real-world data as long as it meets project requirements. The goal of this project is to gather authentic, naturally occurring content across multiple digital sources and languages, no synthetic or generated data. ⚠️ Synthetic or AI-generated data will not be accepted. Personally Identifiable Information (PII) will be redacted. Contributors will be asked to collect content from the following asset types: Emails (Mail) Messages (SMS, chat apps, etc.) Notes Files Voicemail / Audio meeting notes (transcripts are acceptable) Webpages Screenshots Photos (camera captures) The content should naturally include: Contact information (will be redacted) Event information People and relationships Topics of interest (listed below) Content must cover a broad range of real-world topics, including but not limited to: Sports; Movies & TV (films, series, Anime/Manga); Food (everyday meals, street food, regional cuisines); Music (songs, artists, albums, soundtracks); Recreational activities (arts, crafts, photography, hobbies); Automotive (cars, motorcycles, public transport); Technology (computers, AI, crypto, emerging tech); Travel & leisure (trip planning, dining, events); Weather & natural events; Health & fitness; Finance (banking, stocks, budgeting); Government & politics; School, College & Work life; Traditions & local celebrations; Gaming (video games and gameplay discussions). Only real, naturally generated data is allowed No synthetic, staged, or AI-generated content Content must be relevant, diverse, and contextually rich Contributors must follow the provided data quality and compliance guidelines

Requirements

  • Only real, naturally generated data is allowed
  • No synthetic, staged, or AI-generated content
  • Content must be relevant, diverse, and contextually rich
  • Contributors must follow the provided data quality and compliance guidelines
  • Freelancers with experience in data collection, annotation, or AI Data projects
  • Native speakers of the target languages
  • Individuals who can source authentic, real-world digital content responsibly and ethically

Responsibilities

  • Collect content from Emails (Mail)
  • Collect content from Messages (SMS, chat apps, etc.)
  • Collect content from Notes
  • Collect content from Files
  • Collect content from Voicemail / Audio meeting notes (transcripts are acceptable)
  • Collect content from Webpages
  • Collect content from Screenshots
  • Collect content from Photos (camera captures)

Benefits

  • Receive a bonus for each successful referral - invite your family and friends
  • Be part of a large-scale global AI initiative and get rewarded for your authentic digital footprint.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Entry Level

Education Level

No Education Listed

Number of Employees

1-10 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service