Data Scientist

SciforiumSan Francisco, CA
2d

About The Position

Sciforium is seeking a highly analytical and systems-aware Data Scientist to design, develop, and refine the next-generation AI models that leverage our large-scale compute clusters. In this role, you will bridge the gap between theoretical research and production-grade performance. You will not only build state-of-the-art LLMs and generative models but also ensure they are architecturally optimized for distributed training environments. This position is ideal for a scientist who thinks deeply about algorithmic efficiency, convergence stability, and how model architecture impacts hardware utilization. You will play a pivotal role in defining the intelligence that powers Sciforium’s core offerings.

Requirements

  • 5+ years of industry experience in Data Science, Machine Learning Research, or a closely related field, with a strong emphasis on deep learning.
  • Bachelor’s or Master’s degree in Computer Science, Statistics, Mathematics, or another quantitative discipline.
  • Expert-level Python skills with deep proficiency in PyTorch or JAX.
  • Demonstrated experience training and deploying large-scale models (e.g., LLMs, diffusion models) in distributed production environments.
  • Deep understanding of distributed training paradigms, including data parallelism, pipeline parallelism, and tensor parallelism.
  • Strong mathematical foundation in linear algebra, calculus, and optimization, particularly as applied to neural network training and convergence.
  • Experience working with data-at-scale tooling, such as Spark, Ray, or high-throughput data loading frameworks.

Nice To Haves

  • PhD in a relevant field, with publications at top-tier conferences (e.g., NeurIPS, ICML, ICLR).
  • Hands-on experience with Mixture-of-Experts (MoE) architectures, including routing and load-balancing challenges.
  • Familiarity with RLHF workflows, including PPO and DPO fine-tuning pipelines.
  • Knowledge of model quantization techniques (e.g., FP8, INT8, AWQ) and their impact on training stability and inference performance.
  • Contributions to open-source ML libraries or involvement in high-profile LLM releases.

Responsibilities

  • Model Architecture Design: Develop and experiment with novel architectures for LLMs and generative AI, focusing on maximizing performance-per-watt and training throughput.
  • Large-Scale Training Execution: Lead the end-to-end training runs of foundation models, monitoring loss curves, stability, and convergence across massive multi-node clusters.
  • Optimization & Scaling Laws: Apply scaling laws to predict model performance and optimize hyperparameters, tokenization strategies, and objective functions for trillion-parameter regimes.
  • Data Engineering & Curation: Build and maintain sophisticated data pipelines that handle petabyte-scale pre-training datasets, ensuring high-quality signal through advanced filtering and deduplication.
  • Algorithmic Profiling: Collaborate with the Training Engineering team to profile how specific model layers (e.g., Attention mechanisms, MoE layers) interact with GPU/accelerator memory and interconnects.
  • Evaluation & Benchmarking: Design robust evaluation frameworks to measure model capability across reasoning, coding, and creative tasks, ensuring alignment with safety and performance standards.
  • Cross-Functional Collaboration: Partner with Infrastructure and Kernel engineers to co-design features that improve training efficiency and model FLOPs utilization (MFU).

Benefits

  • Medical, dental, and vision insurance
  • 401k plan
  • Daily lunch, snacks, and beverages
  • Flexible time off
  • Competitive salary and equity
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service