Senior Software Engineer, ML Platform

Parafin•San Francisco, CA

38d

About The Position

We’re looking for a software engineer to join Parafin’s Infrastructure team and lead the evolution of our ML Platform. This role is critical to building reliable, scalable, and developer-friendly systems for model experimentation, training, evaluation, inference, and retraining that power underwriting and other ML-driven products for small businesses. As a Software Engineer, you’ll design, build, and maintain the core abstractions and platforms that let data scientists ship high-quality models to production—safely and quickly. You’ll partner closely with Data Science and Platform Engineering, own the ML platform end-to-end, and develop batch and real-time underwriting infrastructure.

Requirements

5+ years of software engineering experience, including experience on ML platform/MLOps systems (training, deployment, and/or feature pipelines).
Strong Python; solid software design and testing fundamentals. Proficiency with SQL; hands-on Spark/PySpark experience.
Knowledge of ML fundamentals—probability & statistics, supervised vs. unsupervised learning, bias/variance & regularization, feature engineering, model evaluation metrics, validation strategies, and production concerns like drift, stability, and monitoring.
Expertise with modern data/ML stacks—AWS, Databricks (workflows, lakehouse, MLflow/registry, Model Serving), and Airflow (or equivalent orchestration).
Experience building real-time systems (service design, caching, rate limiting, backpressure) and batch pipelines at scale.
Practical knowledge of feature-store concepts (offline/online stores, backfills, point-in-time correctness), model registries, experiment tracking, and evaluation frameworks.
Strong problem-solving skills and a proactive attitude toward ownership and platform health.
Excellent communication and collaboration skills, especially in cross-functional settings.

Nice To Haves

Databricks experience (MLflow, Model Serving).
Experience with feature stores (e.g., Tecton, Feast) and streaming (Kafka/Kinesis).
Experience with fintech, risk, or underwriting systems; familiarity with model safety checks, rejection/override flows, and auditability.
Background with A/B testing platforms, shadow/canary deployments, and automated rollback.
Experience with low-latency inference systems.

Responsibilities

Turn notebooks into software. Decompose data scientist training/inference notebooks into reusable, tested components (libraries, pipelines, templates) with clear interfaces and documentation.
Create developer-friendly ML abstractions. Build SDKs, CLIs, and templates that make it simple to define features, train/evaluate models, and deploy to batch or real-time targets with minimal boilerplate.
Build our real-time ML inference platform. Stand up and scale low-latency model serving.
Expand batch ML inference. Improve scheduling, parallelism, cost controls, observability, and failure/rollback for large-scale batch scoring and post-processing.
Own and expand the feature store. Design offline/online feature definitions, high read/write throughput, and consistent offline/online semantics.
Platform reliability and observability. Instrument training/inference for latency, throughput, accuracy, drift, data quality, and cost; build alerting and dashboards; drive incident response and postmortems.
Underwriting infrastructure partnership. Support production batch and real-time underwriting systems in collaboration with Data Science; collaborate on model interfaces, SLAs, safety checks, and product integrations.