Sr. Developer

Thermo Fisher Scientific•Waltham, MA

19h•$178,131 - $186,000•Remote

About The Position

Thermo Fisher Scientific Inc. is seeking a Sr. Developer for their Waltham, MA location. This role involves designing and implementing scalable data pipelines using AWS services and developing generative AI use cases. The position requires expertise in molecule-level data, LLM development, and integrating AI systems with enterprise data ecosystems. The Sr. Developer will also be responsible for establishing LLM evaluation frameworks, conducting architecture reviews, and implementing CI/CD pipelines. Additionally, the role involves developing intelligent knowledge workflows and process mining dashboards using Celonis EMS.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, or related field of study.
7 years of Software Engineering, Data Engineering, or related experience.
Proficiency in programming languages: Python, R, Scala, and Java.
Design and implement scalable, distributed data pipelines using AWS services such as S3, Redshift, Glue, Lambda, EMR, Athena, and Kinesis, with transformation logic developed in PySpark, Python, and SQL.
Proficient in Linux/Unix environments with experience in shell scripting (Bash) for automation and system operations.
Experienced in working with relational databases such as MySQL, PostgreSQL, and SQL Server, as well as cloud data warehouses like Amazon Redshift.
Architect and lead the implementation of a comprehensive security framework for the Databricks platform, including identity and access management (IAM), data governance, network security, encryption, and audit controls.
Define and enforce enterprise-grade security standards across Databricks workspaces, Unity Catalog, and associated data pipelines, ensuring alignment with organizational policies and industry best practices.
Implement and manage user access provisioning and authentication through Microsoft Entra ID (formerly Azure AD), including SCIM-based group provisioning, SSO integration, RBAC policies, and conditional access for Databricks.
Apply deep domain expertise in molecule-level data to uncover strategic insights and identify business opportunities across the drug development lifecycle, including linking molecular entities, manufacturers, regulatory events, and clinical stage indicators to support asset evaluation and portfolio optimization.
Experience in managing Databricks Unity Catalog using Terraform, including configuration of external locations, catalogs, schemas, and access controls.
Proficient in automating data governance and access management through Terraform modules to provision Unity Catalog resources and integrate securely with cloud storage.
Implement automated CI/CD pipelines with GitHub, GitHub Actions, Jenkins, and Airflow, enabling modular, version-controlled deployment of infrastructure.
Develop and deploy machine learning models using supervised learning (linear regression, logistic regression, decision trees, random forests), unsupervised learning (k-means clustering, PCA), and deep learning (neural networks, CNNs, RNNs) to generate actionable insights and improve metrics.
Apply time series forecasting models such as ARIMA, Prophet, and LSTM for predictive analytics on temporal datasets.
Apply NLP and text analytics techniques, including text preprocessing, TF-IDF, Word2Vec embeddings, and transformer-based models (BERT) for text classification and entity recognition.
Create interactive visualizations using Power BI on top of Databricks Delta tables for real-time analytics and develop in-depth exploratory visualizations using Matplotlib, Seaborn, and Plotly.
Develop and maintain interactive dashboards and visualizations in Amazon QuickSight, leveraging data processed and stored in Delta Lake.

Nice To Haves

Master’s degree in Computer Science, Computer Engineering, or related field of study plus 5 years of Software Engineering, Data Engineering, or related experience.

Responsibilities

Design and implement scalable, distributed data pipelines using AWS services such as S3, Redshift, Glue, Lambda, EMR, Athena, and Kinesis, with transformation logic developed in PySpark, Python, and SQL.
Architect and lead the implementation of a comprehensive security framework for the Databricks platform, including identity and access management (IAM), data governance, network security, encryption, and audit controls.
Define and enforce enterprise-grade security standards across Databricks workspaces, Unity Catalog, and associated data pipelines, ensuring alignment with organizational policies and industry best practices.
Implement and manage user access provisioning and authentication through Microsoft Entra ID (formerly Azure AD), including SCIM-based group provisioning, SSO integration, RBAC policies, and conditional access for Databricks.
Apply deep domain expertise in molecule-level data to uncover strategic insights and identify business opportunities across the drug development lifecycle. This includes interpreting and linking molecular entities, manufacturers, regulatory events, and clinical stage indicators to support asset evaluation and portfolio optimization.
Lead the design, development, and deployment of generative AI use cases, taking them from ideation through production implementation, ensuring long-term scalability and maintainability.
Develop and fine-tune large language model (LLM) applications, including prompt engineering strategies, reusable prompt templates, and context augmentation techniques for improved response accuracy and relevance.
Integrate generative AI systems with enterprise data ecosystems using REST APIs, vector databases, knowledge graphs, orchestration frameworks, and other scalable backend components.
Establish robust LLM evaluation and monitoring frameworks, defining key metrics for measuring model accuracy, relevance, safety, and overall production performance.
Collaborate cross-functionally with engineering, data science, product, and business stakeholders to prioritize and deliver impactful, responsible AI solutions aligned with business goals.
Conduct architecture reviews and optimize end-to-end data pipelines and GenAI workflows for cost efficiency, runtime performance, and scalability in multi-cloud environments.
Implement CI/CD pipelines using GitHub and GitHub Actions, enabling modular, version-controlled deployment of infrastructure, data products, and AI applications.
Develop intelligent knowledge workflows using Flowise, retrieval-augmented generation (RAG), function calling, SQL orchestration, and webhook integrations to support dynamic use cases.
Design and build process mining dashboards using Celonis EMS, including KPI definitions, root cause analysis using Process Query Language (PQL), and operational insights.
Automate enterprise workflows through Celonis Action Flows, integrating seamlessly with systems like SAP, Salesforce, and other business platforms to enable process optimization.
Model enterprise process data within the Celonis Data Model (CDM) and configure scalable data pipelines using Celonis Data Integration for high-performance analytics.

Benefits

A choice of national medical and dental plans, and a national vision plan, including health incentive programs
Employee assistance and family support programs, including commuter benefits and tuition reimbursement
At least 120 hours paid time off (PTO)
10 paid holidays annually
Paid parental leave (3 weeks for bonding and 8 weeks for caregiver leave)
Accident and life insurance
Short- and long-term disability
Competitive 401(k) U.S. retirement savings plan
Employees’ Stock Purchase Plan (ESPP) offers eligible colleagues the opportunity to purchase company stock at a discount

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume