About The Position

NVIDIA is at the forefront of the AI revolution, and our research is shaping the future of large language models. We are looking for a Senior Scientist to join our team and help advance our capabilities in generating synthetic datasets and privacy-preserving AI. You will contribute to open-source libraries within the NVIDIA NeMo ecosystem that enable high-quality synthetic data generation at scale while ensuring data privacy. This role combines hands-on software engineering with research in privacy-enhancing methods, and you will collaborate with research, engineering, product teams, and external labs.

Requirements

  • PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience.
  • A research background of 5+ years in synthetic data generation, data privacy, or related areas such as differential privacy, federated learning, or trustworthy machine learning is required. Comparable experience is also considered.
  • Proven track record of developing or maintaining software libraries used by a broad developer community.
  • Deep technical understanding of PyTorch and the HuggingFace Transformers ecosystem including PEFT and LoRA.
  • Technical familiarity with LLM inference frameworks such as vLLM or TGI.
  • Strong publication record at premier venues such as NeurIPS, ICML, ICLR, ACL or similar.

Nice To Haves

  • Active contributions to open-source projects, particularly in ML, security, or privacy domains.
  • Specialized expertise with differential privacy concepts and tools such as Opacus.
  • Ability to build and optimize scalable data processing pipelines for large-scale models.
  • Proficiency with NER-based PII detection and advanced anonymization techniques.
  • Functional knowledge of global privacy regulations such as GDPR or CCPA.

Responsibilities

  • Build and implement advanced pipelines for generating synthetic datasets using innovative LLM-based methodologies and automated quality evaluation frameworks.
  • Research and implement privacy-preserving techniques such as differentially private training (DP-SGD), identifying and replacing sensitive information via NER models, and membership inference protection.
  • Design and maintain open-source software libraries and SDKs with clean APIs and developer-facing documentation, applying robust software design patterns.
  • Drive software excellence through modern development tooling, architecture managed by configurations, and professional Git/CI-CD workflows.
  • Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership.
  • Mentor interns and junior researchers to develop technical growth within the team.

Benefits

  • You will also be eligible for equity and benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service