Reliability Engineer

DanaherWaltham, MA
5d

About The Position

Bring more to life. Are you ready to accelerate your potential and make a real difference within life sciences, diagnostics and biotechnology? At Abcam, one of Danaher's 15+ operating companies, our work saves lives—and we're all united by a shared commitment to innovate for tangible impact. You'll thrive in a culture of belonging where you and your unique viewpoint matter. And by harnessing Danaher's system of continuous improvement, you help turn ideas into impact - innovating at the speed of life. Shape the Future with Us! For over 25 years, Abcam has been providing tools the scientific community needs to enable faster breakthroughs in critical areas like cancer, neurological disorders, infectious diseases, and metabolic disorders. We believe that to continue making progress, we need to work together, each bringing our own unique perspectives to make an impact on the world. This community needs people like you: dedicated, agile and above all audacious so we can truly drive science forward. Learn about the Danaher Business System which makes everything possible. We are seeking a highly motivated Reliability Engineer to join our team. As a Reliability Engineer, you will play a crucial role in ensuring the stability, performance, and reliability of our production systems. Your responsibilities will include proactively identifying and resolving technical issues, leading major incident responses, and implementing best practices for system reliability. You will work closely with cross-functional teams to develop and maintain robust monitoring and automation solutions. This position reports directly to the Global Reliability Manager. In this role, you will have the opportunity to: Shape system reliability at scale by monitoring performance, spotting trends, and preventing issues before they impact users. Take charge during critical moments , leading major incident responses and driving rapid service restoration. Solve complex problems for the long term , collaborating across teams to implement robust, sustainable solutions. Automate and innovate , building tools and processes that streamline operations and reduce manual work. Drive continuous improvement , using data insights and post-incident learnings to make systems more resilient every day. Participate in an on-call rotation to provide 24/7 support for critical systems and respond to incidents as needed.

Requirements

  • Automation & Scripting: Ability to code repeatable tasks using PowerShell, Bash, or Python, and familiarity with infrastructure-as-code tools such as Terraform and configuration management tools such as Puppet.
  • Cloud & Infrastructure: Strong knowledge of AWS Cloud services, networking, security, and storage solutions both on-premises and on the cloud.
  • Reliability & Scalability: High-level understanding of High Availability, Disaster Recovery, scalability solutions, and web infrastructure troubleshooting using logs.
  • Monitoring & Incident Management: Proficiency with monitoring dashboards (Grafana, Humio, CloudWatch) and incident management tools like ServiceNow and PagerDuty.
  • Database & Pipelines: Good understanding of SQL Server, Oracle, PostgreSQL (including DML), and familiarity with CI/CD pipelines such as GitLab CI.

Nice To Haves

  • EKS troubleshooting knowledge
  • Application support experience
  • Linux OS trouble shooting experience
  • Oracle Cloud Infrastructure knowledge

Responsibilities

  • Shape system reliability at scale by monitoring performance, spotting trends, and preventing issues before they impact users.
  • Take charge during critical moments , leading major incident responses and driving rapid service restoration.
  • Solve complex problems for the long term , collaborating across teams to implement robust, sustainable solutions.
  • Automate and innovate , building tools and processes that streamline operations and reduce manual work.
  • Drive continuous improvement , using data insights and post-incident learnings to make systems more resilient every day.
  • Participate in an on-call rotation to provide 24/7 support for critical systems and respond to incidents as needed.

Benefits

  • Abcam, a Danaher operating company, offers a broad array of comprehensive, competitive benefit programs that add value to our lives. Whether it's a health care program or paid time off, our programs contribute to life beyond the job.
  • Check out our benefits at Danaher Benefits Info .
  • We offer comprehensive package of benefits including paid time off, medical/dental/vision insurance and 401(k) to eligible employees.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service