Senior HPC Software Engineer

Ford Motor Company•Dearborn, MI

4d•$113,580 - $192,900•Onsite

About The Position

We are seeking a senior technical contributor to help support, modernize, and scale our on premise high performance computing platform. This role will work across Linux systems administration, HPC operations, Kubernetes-based services, automation, observability, software tooling, and user-facing platform delivery. The ideal candidate has deep experience administering RHEL based systems in complex compute environments and is comfortable troubleshooting issues across operating systems, schedulers, storage, networking, containers, applications, and user workloads. This person will play a key role in improving the reliability, usability, and operational maturity of the platform. They will help develop and maintain core HPC services, support users running demanding engineering and AI/ML workloads, and create tooling, scripts, APIs, and integrations. Strong software engineering fundamentals are important, including experience with Python, Go, or similar languages, Git-based development workflows, code reviews, testing practices, CI/CD pipelines, documentation, and maintainable code design. Experience with Slurm or other workload managers is highly valued. We are looking for someone who can balance strong technical depth with a user-focused delivery mindset. This role requires the ability to work collaboratively with platform engineers, application teams, and technical users to identify pain points, resolve production issues, document repeatable processes, and build durable improvements. The right candidate will be pragmatic, a team player, comfortable in a fast-moving environment, and motivated by making complex, massive on-prem infrastructure easier to operate, automate, observe, and continuously improve.

Requirements

Deep experience administering RHEL based systems in complex compute environments.
Comfortable troubleshooting issues across operating systems, schedulers, storage, networking, containers, applications, and user workloads.
Strong software engineering fundamentals.
Experience with Python, Go, or similar languages.
Experience with Git-based development workflows.
Experience with code reviews.
Experience with testing practices.
Experience with CI/CD pipelines.
Experience with documentation.
Experience with maintainable code design.
Ability to balance strong technical depth with a user-focused delivery mindset.
Ability to work collaboratively with platform engineers, application teams, and technical users.
Pragmatic.
Team player.
Comfortable in a fast-moving environment.

Nice To Haves

Experience with Slurm or other workload managers.

Responsibilities

Support, modernize, and scale the on-premise high performance computing platform.
Work across Linux systems administration, HPC operations, Kubernetes-based services, automation, observability, software tooling, and user-facing platform delivery.
Administer RHEL based systems in complex compute environments.
Troubleshoot issues across operating systems, schedulers, storage, networking, containers, applications, and user workloads.
Improve the reliability, usability, and operational maturity of the platform.
Develop and maintain core HPC services.
Support users running demanding engineering and AI/ML workloads.
Create tooling, scripts, APIs, and integrations.
Utilize Python, Go, or similar languages for software engineering tasks.
Employ Git-based development workflows, code reviews, testing practices, and CI/CD pipelines.
Document processes and code.
Design maintainable code.
Work collaboratively with platform engineers, application teams, and technical users.
Identify pain points.
Resolve production issues.
Build durable improvements.
Operate, automate, observe, and continuously improve complex, massive on-prem infrastructure.