Senior Infrastructure Engineer

GSK•Upper Providence, PA

About The Position

The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step-change in our ability to leverage data, knowledge, and prediction to find new medicines. We are a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward: Building a next-generation data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity and reducing time spent on “data mechanics” Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent Aggressively engineering our data at scale to unlock the value of our combined data assets and predictions in real-time A Senior Infrastructure Engineer is a leading technical contributor who can consistently take a loosely defined business or technical requirement, architect and build it to a well-defined specification, and execute on it at a high level. They have a strong focus on metrics, both for the impact of their work and for its inner workings / operations. A Senior Infrastructure Engineer should be deeply familiar with the tools of their specialization and of their customers and engaged with the open-source community surrounding them – potentially, even to the level of contributing pull requests.

Requirements

Bachelor’s degree in Computer Science, Software Engineering, or related discipline
4+ years of Linux Systems Administration experience
Experience with modern software development tools / ways of working (e.g. git/GitHub, DevOps tools, metrics / monitoring…)
Experience with enterprise Linux hardware and software or troubleshooting and maintenance.

Nice To Haves

Deep knowledge and use of at least one common programming and scripting language: e.g., Python, Bash, including toolchains for documentation, testing, and operations / observability
Application experience of CI/CD implementations using git and a common CI/CD stack (e.g. GitLab, Azure DevOps)
Demonstrated excellence with agile software development environments using tools like Jira and Confluence

Responsibilities

Broad and deep knowledge of Linux systems administration, storage (e.g. Weka/ZFS) and network configuration
Design, build, and operate tools, services, workflows etc that deliver high value, through the solution to key business problems
Work to transition from classical HPC ways of working to a modern "private cloud" approach that focuses on Infrastructure-as-Code, DevOps-driven workflows, and containerization, and helps put site reliability engineering and automation at the core of every running Onyx service
Implement and maintain observability stack and processes for infrastructure (Grafana, Prometheus, InfluxDB) across all of HPC and other computing components
Own the schedulers (e.g. Slurm) and processes that continually improve users' access to resources in the most efficient way
Provide input into the roadmaps of teams representing upstream dependencies, to help improve the overall program of work
Fully versed in coding best practices and ways of working, and participates in code reviews/partnering to improve the team's standards
Design innovative strategy beyond the current enterprise way of working to create a better environment for the end users, and be able to construct a coordinated, stepwise plan to bring others along with the change curve
Provide thought leadership to team members to help others get the job done right, first time

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume