Senior/Principal Artificial Intelligence Models- Hybrid

Sandia CorporationAlbuquerque, NM
72dHybrid

About The Position

Sandia's artificial intelligence (AI) team is building the U.S. Department of Energy's (DOE) next-generation AI Platform, an integrated scientific AI capability that delivers rapid, high-impact solutions for national security, science, and applied energy missions. The Platform is based on three pillars: Models, Infrastructure, and Data. You will join the Models Pillar team to architect, develop, and deploy fine-tuned reasoning models, domain foundation models, high-fidelity surrogate models, and autonomous agents. Your work will compress mission timelines by enabling scientists and engineers to explore design spaces, evaluate outcomes, and steer experiments and simulations with transparent, high-assurance AI workflows. We anticipate multiple hires for the Models Pillar that collectively span the set of responsibilities and skills described below. Likewise, new hires will be expected to work in conjunction with existing Sandia staff and teams from other DOE laboratories to deliver on this ambitious, fast-paced project. Importantly, we anticipate that while AI Platform development will leverage existing AI and data science tools extensively, success will also require considerable innovation and problem solving to address the unique needs of DOE applications. If this sounds like an exciting challenge to you, we look forward to reading your application!

Requirements

  • Bachelor's degree in Computer Science, Electrical Engineering, Mathematics, or a related STEM field plus five (5) years of directly relevant experience, or an equivalent combination of education and experience
  • Ability to obtain and maintain a DOE Q clearance

Nice To Haves

  • Graduate degree in a relevant computationally-intensive discipline where an independent research project was a graduation requirement (e.g., independent project, thesis, or dissertation).
  • Experience in developing software and AI systems for enterprise and national security applications.
  • Demonstrated software development skills and familiarity with modern software development practices.
  • Proven ability to work and communicate effectively in a collaborative and interdisciplinary team environment.
  • Demonstrated expertise with deep learning frameworks (PyTorch, TensorFlow) and proficiency in Python.
  • Experience with distributed computing frameworks (MPI, Horovod, Ray) and orchestration tools (Kubernetes).
  • Proficiency with C++, CUDA, or other performance-oriented languages/environments.
  • Familiarity with distributed training frameworks (MPI, Horovod, Ray), hyperparameter tuning, and HPC systems.
  • Hands-on experience with model optimization techniques (quantization, pruning, distillation) and hardware acceleration.
  • Proficiency with MLOps toolchains for CI/CD, experiment tracking, and monitoring (MLflow, Kubeflow, TFX).
  • Knowledge of human-centered AI principles and UX design for model-driven applications.
  • Knowledge of high-assurance AI: formal methods, red-teaming, interpretability, and runtime safety.
  • Strong collaboration skills in dynamic, interdisciplinary teams and experience mentoring junior engineers.
  • Developing and deploying large language models, multimodal AI systems, or advanced reinforcement-learning agents.
  • Integrating AI workflows with robotics, experimental facilities, or digital twins.
  • Contributing to open-source AI frameworks or publishing peer-reviewed research.
  • Implementing secure AI workflows in classified or regulated environments.
  • Ability to obtain and maintain a SCI clearance, which may require a polygraph test.

Responsibilities

  • Research, fine-tune, and certify large reasoning models (LLMs, graph neural nets, vision transformers, etc.) for domain tasks in materials science, chemistry, physics, grid controls, and nuclear security
  • Develop and integrate domain foundation models trained or adapted on DOE simulation, experimental, and production data
  • Build AI surrogates to accelerate exascale multiphysics simulations, enabling millisecond-scale predictions
  • Design and implement multi¿agent frameworks (hypothesizers, planners, executors, retrievers, assessors) with transparent decision graphs, uncertainty quantification, and audit logs
  • Embed continuous learning pipelines: connect model training/evaluation to live telemetry from HPC clusters, experiments, and autonomous labs
  • Establish a model repository with metadata, SBOMs, versioning, drift/poisoning surveillance, and periodic recertification
  • Implement high-assurance controls: least-privilege execution, runtime shields/tripwires, deterministic fallbacks, cryptographic provenance, and enclave attestation for sensitive workloads
  • Collaborate with Data and Infrastructure teams to align model requirements with data lakehouses, compute fabric, and edge inference systems
  • Contribute to open-source and internal AI frameworks, toolkits, and best practices for agentic workflows

Benefits

  • Career advancement and enrichment opportunities
  • Flexible work arrangements for many positions include 9/80 (work 80 hours every two weeks, with every other Friday off) and 4/10 (work 4 ten-hour days each week) compressed workweeks, part-time work, and telecommuting (a mix of onsite work and working from home)
  • Generous vacation, strong medical and other benefits, competitive 401k, learning opportunities, relocation assistance and amenities aimed at creating a solid work/life balance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

National Security and International Affairs

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service