Regeneron Pharmaceuticals-posted 2 months ago
$101,800 - $194,500/Yr
Full-time • Senior
5,001-10,000 employees

The Principal, Data Engineer builds data infrastructure, leads technical initiatives, and mentors junior team members while driving data-driven solutions across the organization. As a Principal, Data Engineer, a typical day might include the following: Design complex data engineering solutions and define standards Mentor junior engineers and drive infrastructure innovation Build scalable, secure data pipelines with robust monitoring Optimize ETL/ELT workflows for large-scale data processing Architect end-to-end data pipeline solutions from ingestion to consumption Implement real-time and batch processing systems to handle diverse biotech data streams Design fault-tolerant pipelines with appropriate error handling and recovery mechanisms Establish CI/CD practices for data pipeline deployment and testing Develop data transformation logic to support analytical and operational needs Integrate disparate data sources including laboratory instruments, clinical systems, and external APIs Implement data validation frameworks to ensure data integrity throughout the pipeline Manage and organize large datasets Ensure data quality and accessibility for data analysts Implement data lake and data warehouse architectures Monitor data pipeline performance and troubleshoot issues Maintain efficiency and reliability of data systems Implement observability solutions for pipeline monitoring Develop automated alerting systems for pipeline failures or anomalies as needed Build and leverage GenAI solutions to improve performance, speed and efficiency of data engineering team Document data processes and systems as required Ensure compliance with data governance policies.

  • Design complex data engineering solutions and define standards
  • Mentor junior engineers and drive infrastructure innovation
  • Build scalable, secure data pipelines with robust monitoring
  • Optimize ETL/ELT workflows for large-scale data processing
  • Architect end-to-end data pipeline solutions from ingestion to consumption
  • Implement real-time and batch processing systems to handle diverse biotech data streams
  • Design fault-tolerant pipelines with appropriate error handling and recovery mechanisms
  • Establish CI/CD practices for data pipeline deployment and testing
  • Develop data transformation logic to support analytical and operational needs
  • Integrate disparate data sources including laboratory instruments, clinical systems, and external APIs
  • Implement data validation frameworks to ensure data integrity throughout the pipeline
  • Manage and organize large datasets
  • Ensure data quality and accessibility for data analysts
  • Implement data lake and data warehouse architectures
  • Monitor data pipeline performance and troubleshoot issues
  • Maintain efficiency and reliability of data systems
  • Implement observability solutions for pipeline monitoring
  • Develop automated alerting systems for pipeline failures or anomalies as needed
  • Build and leverage GenAI solutions to improve performance, speed and efficiency of data engineering team
  • Document data processes and systems as required
  • Ensure compliance with data governance policies
  • Strong Python, Java, or Scala programming skills
  • Deep SQL expertise and relational database experience
  • NoSQL and big data technology experience (Hadoop, Spark, Kafka)
  • Proficiency in data modeling and schema design
  • Knowledge of data security and compliance requirements in regulated environments
  • Familiarity with Biotech Enterprise Systems (MES, LIMS, QMS)
  • Excellent communication skills
  • Knowledge of MCP and Orchestration platforms related to AI/GenAI
  • Proficiency in star schemas and data modeling tools
  • Knowledge of industry regulatory requirements (CFR Part 11, GxP, CSA)
  • Stream processing experience (Kafka, Flink)
  • Cloud certifications
  • BA/BS in Computer Science, Bioinformatics, or related field
  • Principal: 8+ years meaningful experience or equivalent combination of education and experience
  • Staff: 10+ years relevant experience or equivalent combination of education and experience
  • Experience in biotech, pharmaceutical, or other life sciences industries preferred
  • Cloud platform experience (AWS, Azure) preferred
  • Experience with workflow orchestration tools (Airflow, Luigi, Prefect, or similar)
  • Experience with containerization technologies
  • Experience with scientific data management systems
  • Experience with using GenAI to enhance own work
  • Health and wellness programs
  • Fitness centers
  • Equity awards
  • Annual bonuses
  • Paid time off for eligible employees at all levels
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service