Data Scientist

Stefanini Group•San Francisco, CA

10h

About The Position

Stefanini Group is hiring! Stefanini is looking for a Data Scientist in San Francisco, CA For quick Apply, please reach out to Prakhar Goyal: (248) 263-5255/ [email protected] Open for W2 only! Responsibilities: you'll be the AI/ML subject matter expert, splitting your time between: 50% - Consulting with internal teams (economists, analysts) to design and implement AI solutions for their use cases 25% - Building and maintaining CDP's core AI/ML models and frameworks 25% - Providing technical support and troubleshooting for AI/ML systems You'll work in a collaborative environment using cutting-edge technologies including Databricks, AWS, Collibra, DataMesh architecture, and PySpark to build scalable, production-ready AI systems. This is a foundational role - you'll establish our MLOps practices, GenAI frameworks, and production AI capabilities from the ground up in a highly regulated Federal environment. What You'll Bring Consulting & Enablement (50%) Your number one job will be to help advise economists and business teams on appropriate modeling approaches based on their use cases Advise on appropriate modeling approaches for diverse scenarios: RAG/knowledge bases, anomaly detection, document understanding, audit analysis Bridge the gap between econometric models (R, Stata) and production ML pipelines Review and provide feedback on AI/ML architectural proposals Train data engineers and business users on AI/ML best practices Model Development (25%) Build production-ready AI systems for document processing (PDFs, XLSX, DOCX, CSV etc.,) Develop and deploy 1-2 RAG/knowledge base systems in first year Create reusable GenAI frameworks and patterns for the organization Implement solutions using AWS AI services (Bedrock, SageMaker, Textract, Databricks etc.,) Ensure models meet explainability requirements for regulated environments MLOps & Support (25%) Establish MLOps framework and model deployment patterns Troubleshoot model performance issues (accuracy, latency, cost) Act as escalation point for AI/ML technical issues Train the Users by providing models and documentation as well as consulting Monitor and maintain production models Stay current on AI/ML techniques and Federal regulatory requirements Help other Support Team members advance their knowledge of Data Science and modeling

Requirements

Deep expertise in search, information retrieval, and ranking systems at scale
Strong understanding of neural search architectures, ML/AI, and generative models
ML model development, implementation, and evaluation
Experience in applying LLMs and agentic AI techniques to production systems
Demonstrated ability to translate technical solutions into business impact
Excellent cross-team collaboration and communication skills
Master's degree in Data Science, Statistics, Computer Science, Mathematics, or related quantitative field
4+ years in data science, ML engineering, or AI development roles
Proven track record building and deploying ML/AI models in production environments
Strong Python proficiency; experience with SQL and at least one statistical language (R, Stata, Matlab, Sparkly R)
Hands-on experience with modern ML frameworks (scikit-learn, TensorFlow, PyTorch, Hugging Face)
Practical experience with LLMs, RAG architectures, and prompt engineering
Experience processing and extracting insights from unstructured documents at scale
Working knowledge of AWS AI/ML services (SageMaker, Bedrock preferred)
Ability to explain complex AI concepts to non-technical stakeholders and translate business problems into technical solutions
Experience working with our tech stack Databricks, AWS AI/ML tools, Starburst is preferred

Responsibilities

Consulting with internal teams (economists, analysts) to design and implement AI solutions for their use cases
Building and maintaining CDP's core AI/ML models and frameworks
Providing technical support and troubleshooting for AI/ML systems
Advise economists and business teams on appropriate modeling approaches based on their use cases
Advise on appropriate modeling approaches for diverse scenarios: RAG/knowledge bases, anomaly detection, document understanding, audit analysis
Bridge the gap between econometric models (R, Stata) and production ML pipelines
Review and provide feedback on AI/ML architectural proposals
Train data engineers and business users on AI/ML best practices
Build production-ready AI systems for document processing (PDFs, XLSX, DOCX, CSV etc.,)
Develop and deploy 1-2 RAG/knowledge base systems in first year
Create reusable GenAI frameworks and patterns for the organization
Implement solutions using AWS AI services (Bedrock, SageMaker, Textract, Databricks etc.,)
Ensure models meet explainability requirements for regulated environments
Establish MLOps framework and model deployment patterns
Troubleshoot model performance issues (accuracy, latency, cost)
Act as escalation point for AI/ML technical issues
Train the Users by providing models and documentation as well as consulting
Monitor and maintain production models
Stay current on AI/ML techniques and Federal regulatory requirements
Help other Support Team members advance their knowledge of Data Science and modeling