High Performance Computing Platform Engineer
PDT Partners
·
Posted:
February 7, 2023
·
Onsite
About the position
The job overview for the High Performance Computing Platform Engineer role is that the candidate will be responsible for designing, implementing, and supporting scalable and performant HPC systems. They will work closely with other platform teams and collaborate with engineers and researchers to build high-quality and reliable systems. The role also involves implementing automation, managing capacity, optimizing benchmarks, and contributing to the day-to-day running of the platform systems. The ideal candidate should have experience in systems programming and/or software engineering, as well as practical experience in supporting and improving production systems.
Responsibilities
- Design, implement, and deliver scalable and performant systems
- Implement automation for the platform infrastructure
- Collaborate closely with peer engineers and/or researchers to build high-quality, efficient, and reliable systems
- Manage capacity and optimize benchmarks for critical workloads
- Run and support platform systems day-to-day through automation and quality work
Requirements
- Design, implement, and deliver scalable and performant systems
- Implement automation for CI/CD pipelines and production metrics
- Collaborate closely with engineers and researchers to build high-quality systems
- Manage capacity and optimize benchmarks for critical workloads
- Contribute to the day-to-day running and support of platform systems
- Experience with systems programming and/or software engineering
- Practical experience supporting, debugging, and improving production systems and services
Benefits
- Practical experience supporting, debugging, and improving production systems and services
- Experience using Linux and other Open Source Software
- Experience with configuration management and infrastructure-as-code frameworks
- Production experience working with a public cloud, AWS preferred
- Experience with distributed parallel filesystems (Lustre, GPFS, parallel NFS)
- Experience with batch scheduling systems (slurm, torque, SGE, AWS batch, AWS parallel cluster)
- Experience with high-performance networking
- Bachelors or Masters degree in an Engineering or Applied Sciences field from a rigorous academic program or equivalent professional experience
- Salary range between $195,000 and $225,000 (excluding potential bonus amounts)