Senior Product Manager - Model Hosting Infrastructure

TetraScience

11h•Remote

About The Position

TetraScience is the Scientific Data and AI company. We are catalyzing the Scientific AI revolution by designing and industrializing AI-native scientific data sets, which we bring to life in a growing suite of next gen lab data management solutions, scientific use cases, and AI-enabled outcomes. TetraScience is the category leader in this vital new market. In the last year alone, the world’s dominant players in compute, cloud, data, and AI infrastructure have converged on TetraScience as the de facto standard, entering into co-innovation and go-to-market partnerships: Latest News and Announcements | TetraScience Newsroom In connection with your candidacy, you will be asked to carefully review the Tetra Way letter, authored directly by Patrick Grady, our co-founder and CEO. This letter is designed to assist you in better understanding whether TetraScience is the right fit for you from a values and ethos perspective. It is impossible to overstate the importance of this document and you are encouraged to take it literally and reflect on whether you are aligned with our unique approach to company and team building. If you join us, you will be expected to embody its contents each day. The Senior Product Manager, Model Infrastructure & Execution Services will lead the strategy for how we orchestrate, deploy, and monitor machine learning workloads. You will own the "Compute & Execution" layer of our platform, ensuring that scientific teams can move from raw data to a trained model, and finally to a production-grade inference endpoint, with zero friction. Your mission is to build a world-class Developer Experience (DX) for ML and AI. You will focus on the "plumbing" that makes AI possible: elastic training environments, high-performance inference services, and the critical metadata layers (lineage and observability) that ensure scientific reproducibility in a regulated environment. This is a platform role. You aren’t building the models; you are building the high-scale machinery that allows Biopharma enterprises to develop and run them at the scale of Petabytes.

Requirements

7+ years of Technical Product Management experience, specifically within cloud infrastructure, backend services, or developer platforms.
Deep understanding of the ML Lifecycle: You should be intimately familiar with the infrastructure requirements for both model training (e.g., job scheduling, distributed compute) and inference (e.g., autoscaling, REST/gRPC APIs).
Infrastructure Fluency: Strong background in container orchestration (Kubernetes), cloud providers (AWS/Azure), and CI/CD pipelines.
Platform Mindset: A track record of building "internal products" or APIs where the primary customer is a developer or a data scientist.
Education: Bachelors or Masters degree in Computer Science, Engineering, or a related technical field.

Nice To Haves

Working hours in Eastern Time Zone
Experience with MLOps frameworks (e.g., Kubeflow, MLflow, or SageMaker) at a Series B-D scale.
Knowledge of Infrastructure-as-Code (Terraform) and observability stacks (Prometheus/Grafana/Datadog).
Background in Life Sciences or Biopharma, understanding the nuances of GxP or regulated data environments.

Responsibilities

Dual Service Strategy (Inference & Training): Define the roadmap for two core service pillars:
Training Services: Orchestrating elastic, cost-optimized compute (GPU/CPU) for model training and experiment tracking.
Inference Services: Managing the deployment of models into high-availability, low-latency API endpoints.
Ease of Development & Deployment: Radicalize the user experience for ML Engineers. You will build self-service "push-button" deployment workflows that abstract away the complexity of Kubernetes and cloud networking.
Lineage & Reproducibility: Ensure every model has a clear "paper trail." You will define how we capture the lineage between data versions, training code, and production artifacts—a critical requirement for Biopharma compliance.
Observability & Governance: Build the tools to monitor model health in production. This includes infrastructure-level metrics (latency/memory) and model-level observability (drift/performance) to ensure system reliability.
Technical Stakeholder Engagement: Partner with Scientific IT and Platform Engineering to ensure our services integrate seamlessly with existing enterprise identity (IAM) and security frameworks.
Backlog & Execution: Act as the "CEO of the Service," translating complex infrastructure needs into clear, actionable epics and user stories for a high-performing engineering team.

Benefits

100% employer-paid benefits for all eligible employees and immediate family members
Unlimited paid time off (PTO)
401K
Flexible working arrangements - Remote work
Company paid Life Insurance, LTD/STD
A culture of continuous improvement where you can grow your career and get coaching

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume