Scribd, Inc. is seeking a Software Engineer II with deep experience building event-driven, distributed, and scalable systems in Python. In this role, you will design and optimize large-scale data and service pipelines running on AWS, supporting Scribd’s content enrichment and metadata systems. You will work closely with cross-functional teams to design reliable backend services that integrate machine learning models and LLM-based components when needed. This role offers the opportunity to work on cutting-edge generative AI and metadata enrichment problems at a truly global scale. The ML Data Engineering team powers metadata extraction, enrichment, and content understanding across all Scribd brands. They process hundreds of millions of documents, billions of images, and deliver high-quality metadata to enable content discovery and trust for millions of users worldwide. Their systems operate at massive scale, supporting diverse datasets like user-generated content (UGC), ebooks, audiobooks, and more. They work at the intersection of machine learning, data engineering, and distributed systems, collaborating closely with applied research and product teams to deploy scalable ML and LLM-powered solutions in production.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level