Data Lake Systems Engineer (Advanced Computing)

Biogen SANDBOXCambridge, MA
4d

About The Position

Reporting to the Head of Data Lake and HPC, the Data Lake Systems Engineer will play a critical role towards ensuring the usability, availability and reliability of the Data Lake computational and storage infrastructure by performing systems administrative and maintenance tasks, fulfilling user requests, resolving incidents and outages, providing comprehensive user training, maintaining high quality documentation, participating in the design and deployment of complex IT systems, and collaborating closely with the research community to more effectively leverage our Data lake infrastructure. This person will oversee a team of contractors to ensure that the key responsibilities listed below are accomplished – this person will own delivery of these services to our customers.

Requirements

  • 5 - 8 years of progressively complex related experience in data engineering.
  • 3 to 4 year of working experience in Big data stack on HDP or similar environments.
  • Expertise in various data

Nice To Haves

  • AWS Big Data Certification is strongly preferred.

Responsibilities

  • Implementation and Administration of Datalake environment.
  • Monitoring and managing the Hadoop services on HDP Production and DR clusters.
  • Maintenance and Monitoring of the jobs of production, UAT and Dev environments.
  • Code changes and updated code deployments in the UAT and Production environments.
  • Deploying code changes on Rshiny server as per the user request.
  • Implmentation and Monitoring of oozie scheduled jobs for the UAT, DEV and Production environments.
  • Implmentation of patching activities and applying the fixes to the Datalake environment provided by the Hortonworks.
  • Working on the job failures mostly Hive and Spark jobs across the Datalake environment.
  • Onboarding the new users to the Hadoop datalake environment.
  • Requirements gathering for creating the databases in Hive and providing policy based access management from the Ranger for the new POCs.
  • Supporting the developers for executing the Adhoc jobs in Hive environments for the existing POCs like enrollment_forecaster etc.
  • HDFS home directories and Hive schema,table and column level enforcing access bases policies management from Ranger.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service