COMPANY: Thermo Fisher Scientific Inc. LOCATION: 168 Third Ave., Waltham, MA 02451 TITLE: Scientist III, Data Engineer HOURS: Monday to Friday, 8:00 am to 5:00 pm DUTIES: Develop scalable data pipelines and build out new API integrations to support continuing increases in data volume and complexity. Own and deliver Projects and Enhancements associated with Data platform solutions. Develop solutions using PySpark/EMR, SQL and databases, AWS Athena, S3, Redshift, AWS APIT Gateway, Lambda, Glue, and other Data Engineering technologies. Write Complex Queries and edit them as required for implementing ETL/Data solutions. Implement solutions using AWS and other cloud platform tools, including GitHub, Jenkins, Terraform, Jira, and Confluence. Follow agile development methodologies to deliver solutions and product features by following DevOps, Data Ops, and Dev Sec Ops practices. Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability. Travel: Up to 5% travel required (domestic and international). Can work remotely or telecommute. REQUIREMENTS: MINIMUM Education Requirement: Master’s degree or foreign degree equivalent in Technology Management, Information Technology, Computer Science, or related field of study. MINIMUM Experience Requirement: 3 years of experience as a Data Developer, Data Engineer, or related occupation. Alternative Education and Experience Requirement: Bachelor’s degree or foreign degree equivalent in Technology Management, Information Technology, Computer Science, or related field of study plus 5 years of experience as a Data Developer, Data Engineer, or related occupation. Required knowledge or experience with: Full life cycle implementation in AWS using PySpark/EMR, Athena, S3, Redshift, AWS API Gateway, Lambda, and Glue; Agile development methodologies following DevOps, Data Ops, and Dev Sec Ops practices; ETL Pipelines, GitHub, Jenkins, Terraform, Jira, Bitbucket, and Confluence; Informatica, Databricks, & AWS Glue; Data Lake using AWS Databricks, Apache Spark, & Python; Data visualization tools like PowerBI and Tableau; Data modeling and optimization for OLAP/OLTP systems with Star/Snowflake schemas; Strong knowledge of SQL, query optimization, and performance tuning in Redshift, Snowflake, or Oracle; Experience with CI/CD pipelines for data workflows using Jenkins, GitHub Actions, or AWS CodePipeline; Data governance, cataloging, and lineage using tools such as AWS Glue Data Catalog, Collibra, or Alation; Implementing data security, encryption, IAM policies, and compliance regulatory frameworks; Batch and real-time streaming pipelines using Kafka and Spark Streaming; Managing data governance, access control, and lineage using Databricks Unity Catalog for secure enterprise data sharing; Implementing Delta Lake architecture for ACID transactions, schema enforcement, and scalable data pipelines; Optimizing Delta Live Tables for automated ETL orchestration and reliable data delivery; Ensuring high availability and SLA-driven production support with proactive monitoring, incident management, and root cause analysis; Collaboration with cross-functional teams to translate scientific, laboratory, and business requirements into scalable data solutions.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees