The ML Data Engineering team powers metadata extraction, enrichment, and content understanding across all Scribd brands. We process hundreds of millions of documents, billions of images, and deliver high-quality metadata to enable content discovery and trust for millions of users worldwide. Our systems operate at massive scale, supporting diverse datasets like user-generated content (UGC), ebooks, audiobooks, and more. We work at the intersection of machine learning, data engineering, and distributed systems, collaborating closely with applied research and product teams to deploy scalable ML and LLM-powered solutions in production. We’re seeking a Software Engineer II with strong backend development experience and a passion for solving complex data challenges at scale. In this role, you’ll design, build, and optimize distributed systems that extract, enrich, and process metadata for a wide range of content. You’ll work closely with ML engineers, product managers, and cross-functional partners to integrate machine learning models and LLM-based services into production pipelines and deliver impactful, high-performance solutions. This role offers the opportunity to work on cutting-edge generative AI and metadata enrichment problems at a truly global scale.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level