Senior Research Software Engineer

Harvard University•Cambridge, MA

104d•Hybrid

About The Position

The Harvard Data Science Initiative (HDSI) is hiring a Senior Research Software Engineer (RSE) to support a portfolio of faculty-led research projects under the HDSI–AWS Impact Computing Alliance. This role is designed for an engineer who thrives in research settings and enjoys translating scientific goals into robust, efficient, and reproducible AI/ML systems. Rather than being tied to a single lab, the RSE will provide shared, cross-project engineering support—helping multiple teams accelerate discovery by building and optimizing machine learning infrastructure, improving performance on modern hardware (including AI accelerators), and enabling scalable execution in AWS and HPC environments. Projects may span domains such as climate and environmental science, global health, and other areas aligned with the alliance’s mission to deliver measurable social and environmental impact. This is a hands-on role with strong collaboration expectations: you’ll work directly with researchers, HDSI technical leadership, and the alliance team to deliver production-grade research software and reusable technical patterns that benefit multiple projects across the Impact Computing umbrella. This position is a benefits-eligible, two-year term appointment through June 30, 2028.

Requirements

Minimum of seven years’ post-secondary education or relevant work experience
BS or MS (or equivalent practical experience) in Computer Science, Computer Engineering, Data Science, or a closely related field
Strong programming skills in Python/C/C++
Experience working with ML frameworks such as PyTorch, TensorFlow, JAX, XLA, Triton, ONNX, Caffe2, or TensorRT
Proven experience in deep learning at scale, familiarity with the “alphabet soup” of distributed computing (DP, TP, SP, CP, EP)
Experience with production environments, including Git-based workflows
Experience working in AWS cloud or HPC environments used for large-scale computation
Prior experience in a research or research-adjacent environment, with an understanding of the scientific software lifecycle
Strong communication skills and a collaborative working style

Nice To Haves

Contributed to compiler infrastructures and optimization frameworks (MLIR, LLVM, XLA, TVM, IREE, Halide)
Experience developing or optimizing high-performance with libraries or kernels (e.g., cuBLAS, cuDNN, CUTLASS, HIP, ROCm, or similar)
Experience with distributed AI/ML training and performance optimization (e.g., PyTorch DDP, FSDP, DeepSpeed)
Experience building tooling for runtime analysis, profiling, and performance diagnostics
Experience with secure or privacy-constrained data environments (e.g., HIPAA-aware engineering practices)
Experience working in interdisciplinary research areas such as climate, environment, health, or astrophysics
Completion of Harvard IT Academy specified foundational courses (or external equivalent) preferred

Responsibilities

Design, build, and maintain ML/AI systems and research software in Python and C/C++
Develop and optimize machine learning training and inference pipelines for accelerator-based systems
Apply systems- and compiler-level optimizations, including: Loop transformations, vectorization, parallelization, and hardware-specific tuning (e.g., SIMD)
Implement and optimize kernels using CUDA, OpenMP, OpenCL, or accelerator-specific programming models
Contribute to or integrate with compiler and IR frameworks such as MLIR, LLVM, XLA, IREE, TVM, or Halide
Analyze and improve performance using profiling and diagnostics focused on: Latency, memory bandwidth, I/O throughput, and compute utilization
Support execution in AWS cloud and HPC environments, including large-scale model training, profiling, debugging, scaling, cost/performance tuning, reliability, CI/testing, packaging, deployment, reproducibility engineering
Follow and promote modern ML and scientific software best practices: Experiment tracking, reproducibility, version control, testing, packaging, and documentation
Collaborate closely with faculty, researchers, and AWS consulting partners on systems engineering, performance optimization, ML infrastructure, compilers/frameworks integration, cloud/HPC execution.
Communicate technical findings, tradeoffs, and progress clearly to research stakeholders (including documentation and handoff-ready tooling)

Benefits

Generous paid time off including parental leave
Medical, dental, and vision health insurance coverage starting on day one
Retirement plans with university contributions
Wellbeing and mental health resources
Support for families and caregivers
Professional development opportunities including tuition assistance and reimbursement
Commuter benefits, discounts and campus perks

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume