Software Engineer (Backend), Content Foundations

Scribd, Inc.•Vancouver, BC

59d•$103,500 - $196,000•Hybrid

About The Position

The Content Foundations team builds the systems that power how content is uploaded, processed, and delivered across Scribd products. This includes everything from ingestion, metadata extraction, early quality controls, and the core artifacts that power search, recommendations, AI/ML systems, and the reading and listening experience. This role is interesting because you'll be joining an experienced team working at the boundary between messy, real-world content and highly structured systems, where file formats vary and metadata inconsistencies become amplified at scale. Scribd operates a hybrid catalog of premium publisher content and user-generated uploads, spanning diverse formats, decade-old systems, and modern services evolving alongside them. Your contributions will directly impact downstream teams and ultimately our customers. Current focus areas include: Improvements to upload flow Content quality and early-stage validation OCR and content extraction for downstream ML/LLM use cases Evolving content formats to support downstream AI workflows Making content and metadata more accessible for downstream systems to consume

Requirements

4+ years of professional software engineering experience, including exposure to production-scale systems.
Experience with backend services, data pipelines, or content-processing systems, depth in any one of these is enough.
Comfortable working with messy data and building systems resilient to real-world inputs.
Proficient in at least one of Ruby, Python, or Go, and willing to ramp up on the others (our stack includes all three).
Working familiarity with AWS (e.g., Lambda, SQS/SNS, S3) or similar cloud resources.
Comfortable working with relational databases (SQL).
Clear written and verbal communicator, able to collaborate with teammates and partner teams.
Collaborative and curious, eager to learn from peers and contribute back.

Nice To Haves

Experience with document formats (PDF, ebooks, markdown) and internals (parsing, OCR, transformation).
Familiarity with ML/AI systems (embeddings, chunking, retrieval pipelines).

Responsibilities

Contribute to core content systems: Design and implement features within ingestion pipelines, metadata services, and content processing workflows, with guidance from senior engineers and the EM on scope and trade-offs.
Build reliable, observable systems: Implement production-quality services that handle diverse file formats, malformed inputs, retries, asynchronous workflows, and edge cases.
Collaborate across teams: Partner with ML Engineering, Search & Discovery, the Content Library squad, and Product to build systems that balance performance, scalability, and user experience.
Improve content quality and discoverability: Work with ML and Discovery teams to enable improvements in metadata extraction, classification, and enrichment that power personalization and search.
Leverage AI-driven engineering practices: Use LLM-based systems and AI coding agents in your day-to-day work, and share your learnings with the team.
Grow your craft: Learn the domain deeply, take on increasingly ambitious problems, and develop your craft intentionally.

Benefits

Scribd Flex (flexible work model)
Comprehensive health, dental, and vision coverage
Mental health support and disability coverage
Generous paid time off, including vacation, sick time, holidays, winter break, volunteer time, and sabbaticals
Paid parental leave and family support benefits
Retirement matching and employee equity
Learning and development programs and professional growth opportunities
Wellness and home office stipends
Complimentary access to the Scribd, Inc. suite of products
Enterprise access to leading AI tools