Software Engineering Intern, Data & Machine Learning

Moon•Glendale, CA

18h•Hybrid

About The Position

Python is central to Moon’s roadmap. Our data and ML layer powers core home services workflows— surfacing operational insights for service company owners and enabling predictive features that help users make better decisions. The data is real operational data at meaningful scale; the problems are genuinely interesting, and mistakes have real downstream consequences. This is not a data science internship where you run notebooks in isolation. You’ll ship code that connects to a real backend and reaches real users. The year-round track is intentional: meaningful data and ML work takes time to build, validate, and integrate into a production product. You’ll go deeper here than you could in 12 weeks.

Requirements

Solid Python — functions, classes, error handling, and code that someone else can read.
Data manipulation with pandas, polars, or equivalent — load a dataset, clean it, answer questions from it without fighting the tools.
SQL — non-trivial queries and a real understanding of what a join is doing.
AI tool usage that is habitual and specific: you’ve used LLMs to accelerate EDA, write boilerplate, or debug data issues, and you can describe exactly how. This is evaluated explicitly.
Genuine intellectual curiosity about data — you want to know why a number looks wrong, not just make the error go away.

Nice To Haves

ML library exposure: scikit-learn, PyTorch, or similar. You don’t need production model experience, but you should know what a train/test split is and why it matters.
Data pipeline tooling: Airflow, Prefect, dbt, or similar.
LangChain, OpenAI/Anthropic API integration, or agent workflow experience.
Cloud data services on Azure, AWS, or GCP.
FastAPI or Python-based API experience.
Statistics coursework — not required, but genuinely useful for the ML work.

Responsibilities

Build and maintain Python ETL pipelines: ingestion, transformation, validation, and reporting.
Write data validation and quality checks — bad data in production is a customer-facing problem, not a technical inconvenience.
Instrument and monitor data pipelines; silent failures are often worse than loud ones.
Collaborate with the .NET team on data contracts between systems.
Write tests for pipeline outputs and model behavior; data pipelines have bugs just like application code does — they’re just harder to find.
Prototype and develop ML features in production or active development — applied to home services operational data.
Integrate LLM capabilities into application features using LangChain, direct API calls, or agent orchestration patterns.
Use AI tools actively across the whole workflow: EDA, code generation, debugging, documentation, and multi-step automated pipelines. AI-assisted development is your default mode, not an occasional tool.
Document data models and transformation logic as part of the definition of done.

Benefits

Competitive hourly compensation, tiered by experience (undergraduate and graduate rates; details shared during the process).
A dedicated mentor working across data engineering and applied ML — enough runway to see features go from prototype to production over a 6–12 month engagement.
Work that ships — features you build will go to production users during the internship.
Real code review under the same standards applied to the full-time team — not the kind that approves everything.
AI tooling stipend (Cursor Pro, Claude Pro, or equivalent) — the AI-native expectation is real; we remove the financial barrier to getting there.
Priority consideration for full-time roles upon graduation.
Access to real-world home services operational data — the problems are genuine, not synthetic.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume