The Content Foundations team builds the systems that power how content enters, evolves, and is delivered across Scribd. This includes everything from ingestion, metadata extraction, early quality controls, and the core artifacts that power search, recommendations, AI/ML systems, and the reading and listening experience. You'll be joining a small and growing team working at the boundary between messy, real-world content and highly structured systems, where file formats vary, metadata can be inconsistent, and scale amplifies every edge case. Scribd operates a hybrid catalog of premium publisher content and user-generated uploads, spanning diverse formats, decade-old systems, and modern services evolving alongside them. Decisions made at ingestion ripple across every downstream system. Current focus areas include: Content quality and early-stage validation, Spam detection at upload time, OCR and content extraction for ML/LLM use cases, Evolving content formats to support downstream AI workflows, Security hardening in partnership with Content and Infra-Security, Architectural improvements to how content and metadata flow across systems, including improving data observability for complex, asynchronous pipelines.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed