Senior ML Ops Engineer

Sprout General Referrals

3h•$135,700 - $205,260•Remote

About The Position

Sprout Social empowers businesses worldwide to harness the immense power and opportunity of social media in today’s digital-first world. Processing over one billion social messages daily, our platform serves up essential insights and actionable information to over 30,000 brands, informing strategic decisions that drive business growth and innovation, and fostering deeper, authentic connections to their end customers. Our full suite of social media management solutions includes comprehensive publishing and engagement functionality, customer care solutions, influencer marketing, connected workflows, and business intelligence. We're actively weaving AI throughout our products to drive our business’s growth trajectory. What you’ll do Build and maintain infrastructure using AWS, Terraform, and Kubernetes to support AI/ML at scale, including Generative AI applications. Manage the end-to-end lifecycle of machine learning models, ensuring observability and tooling support both scale and speed. Execute at scale while staying nimble enough to keep up with new capabilities being offered by social network APIs. Improve processes and champion ideas that matter while holding the team accountable to high code quality and engineering standards. Support our AI/ML Scientists by developing tooling to streamline model development and deployment. What you’ll bring We’re looking for a creative, collaborative, pragmatic, highly motivated, and impact oriented technical leader to join our team in building great software. If you can solve hard problems, deliver quality server-side software, and confidently guide your peers to learn from and teach each other, we’d love to talk with you! Within 1 month, you’ll plant your roots, including: Complete Sprout’s New Hire training program alongside other new Sprout team members. Get acclimated to the team's current Mission, Goals, and Objectives along with future product roadmaps. Become familiar with the team’s existing deployment patterns and the ML Ops tooling ecosystem. Within 3 months, you’ll start hitting your stride by: Decomposing work into small, similarly sized units and working with your squad to prioritize quarterly team goals. Setting up initial software for model deployment and monitoring of ML models. Partnering with the Infrastructure team to deploy an existing ML model in Kubernetes. Acting as the domain owner for new projects and writing necessary design documents. Within 6 months, you’ll be making a clear impact through: Rolling out monitoring and alerting tools to identify problems before they affect users. Helping deploy new ML models processing hundreds of millions of messages a day. Identifying technical debt and performance bottlenecks, and executing plans to improve the code. Collaborating effectively across the organization to ensure big-picture alignment. Within 12 months, you’ll make this role your own by: Becoming the go-to expert of ML Ops at Sprout. Developing repeatable deployment patterns for data scientists to train, deploy, and evaluate batch, REST, and event-based ML services. Owning cross-organizational projects and mentoring junior engineers to help them level up technically. Surprising us! Use your unique ideas and abilities to change your team in beneficial ways.

Requirements

5+ years of experience developing and supporting AI/ML software in a production environment.
5+ years of experience programming in object-oriented languages such as Java, Python, or C++.
Impact-oriented mindset with an interest in stability at scale and a willingness to engage in feature development.

Nice To Haves

3+ years of experience developing and supporting scalable, distributed backend services.
3+ years of experience building and supporting GPU-heavy services.
1+ years of experience with LLMs / Generative AI, including managing their unique costs, constraints, and observability challenges.
1+ years of experience with Infrastructure-as-Code (Terraform) and container orchestration (Kubernetes) within AWS environments.

Responsibilities

Build and maintain infrastructure using AWS, Terraform, and Kubernetes to support AI/ML at scale, including Generative AI applications.
Manage the end-to-end lifecycle of machine learning models, ensuring observability and tooling support both scale and speed.
Execute at scale while staying nimble enough to keep up with new capabilities being offered by social network APIs.
Improve processes and champion ideas that matter while holding the team accountable to high code quality and engineering standards.
Support our AI/ML Scientists by developing tooling to streamline model development and deployment.

Benefits

Comprehensive Health & Wellness: Premium BCBSIL medical, dental (high/low plans), and vision (Eyemed) insurance for you and your eligible dependents.
Premium Mental Health Support: Full, free access to Modern Health for you and your dependents, including coaching, therapy sessions, and digital wellness resources.
Retirement Savings: 401(k) plan with a 50% company match on your first 6% of contributions (a 3% total match).
Financial Security: 100% employer-paid Life and Disability insurance for your peace of mind.
Flexible Paid Time Off: A flexible PTO policy, supplemented with additional company-wide Rest & Recharge days throughout the year.
Paid Parental Leave: Up to 16 weeks of paid leave for new parents to support you in expanding your family.
Annual Lifestyle Stipend: A $1,000 USD annual Lifestyle Spending Account to spend on your physical, mental, and financial well-being.
Work From Home Support: A one-time $550 USD stipend to set up your home office, plus a monthly $50 USD stipend for internet.
Giving Back: 16 hours of paid volunteer time annually, plus a $100 annual match for your charitable donations.
Additional Financial Perks: Access to pre-tax commuter benefits, subsidized child/eldercare (Care.com), discounted pet insurance (Figo), and no-cost personalized financial wellness support through Your Money Line.