Senior Principal Data Scientist - Cataloging & Metadata

Johnson & JohnsonTitusville, NJ
4hHybrid

About The Position

At Johnson & Johnson, we believe health is everything. Our strength in healthcare innovation empowers us to build a world where complex diseases are prevented, treated, and cured, where treatments are smarter and less invasive, and solutions are personal. Through our expertise in Innovative Medicine and MedTech, we are uniquely positioned to innovate across the full spectrum of healthcare solutions today to deliver the breakthroughs of tomorrow, and profoundly impact health for humanity. Learn more at https://www.jnj.com Job Function: Data Analytics & Computational Sciences Job Sub Function: Data Science Job Category: Scientific/Technology All Job Posting Locations: Titusville, New Jersey, United States of America Job Description: Job Description Johnson and Johnson Innovative Medicine (J&J IM), a pharmaceutical company of Johnson & Johnson is recruiting for a Cataloging Data Scientist This position has a primary location of Titusville, NJ but is also open to candidates from Cambridge, Boston , Madrid, Spain About Innovative Medicine About Innovative Medicine Our expertise in Innovative Medicine is informed and inspired by patients, whose insights fuel our science-based advancements. Visionaries like you work on teams that save lives by developing the medicines of tomorrow. Join us in developing treatments, finding cures, and pioneering the path from lab to life while championing patients every step of the way. Learn more at https://www.jnj.com/innovative-medicine We are searching for the best talent for Senior Principal Data Scientist - Cataloging & Metadata Purpose We are seeking a Sr. Principal Data Scientist for Cataloging, Metadata and Governance Team to design, develop, and implement automated AI solutions that address complex enterprise business challenges. In addition to building robust AI models, you will play a key role in enhancing data quality by curating, validating, and enriching metadata from multiple sources, and conducting quality checks on all relevant data fields. You will collaborate closely with Data Management, Platform Teams, Product Owners, and Business stakeholders to support catalog automation setup, improving data catalog usability and ensuring seamless data access for analytics and decision-making. The Senior Principal Data Scientist - Cataloging & Metadata collaborates with cross-functional teams, including Data Management, Platform Teams, Product Owners, and Business stakeholders—to ensure metadata from multiple sources is curated, validated, and enriched, and that quality checks are rigorously performed across all relevant data fields. This role aligns with catalog automation initiatives, enhances data catalog usability, and supports seamless data access to empower analytics and informed decision-making. In addition to these core responsibilities, the Cataloging Data Scientist also designs, develops, and implements generative AI solutions that are closely integrated with cataloging processes, further enhancing the discoverability, findability, and overall usability of the data catalog, and driving continuous improvement in data management and discovery. You will be responsible for:

Requirements

  • Masters/PhD in Lifesciences with master’s in computer science, Data Science, Information Systems (or equivalent degree)
  • 7+ years of experience in computational biology, automation, data cataloging (platforms such as TileDB, Collibra, Alation etc), business analysis, data science or related fields preferably within Life Sciences or a regulated industry.
  • Familiarity with data engineering, automation, data management, data compliance, quality, governance & AI Solutions
  • 6+ years of hands-on experience in python, SQL and other AI automation tools
  • Strong python skills with API integration and backend development using FastAPI or Flask
  • Experience with data cataloging platforms and metadata extraction via APIs
  • Experience with databases(Snowflake, Postgres) & version control(GIT)
  • Hands-on experience building & deploying Gen AI
  • Strong troubleshooting skills across pipelines, APIs and dataflows
  • Strong stakeholder management skills with the ability to successfully drive solutions independently
  • Strong people management skills with the ability to mentor and guide resources.
  • Strong communication skills with ability to seamlessly work across technical and business teams
  • Strong sense of ownership and accountability in managing critical tasks and responsibilities to ensure successful project outcomes.

Nice To Haves

  • Experience in setting up automations and building intelligent solutions (machine-readable metadata, profiling, validation rules, anomaly detection etc)
  • Excellent attention to detail, data organization, and documentation skills
  • Familiarity with automated metadata ingestion & catalog curation workflows.
  • Ability to translate complex data concepts into clear, accessible documentation.

Responsibilities

  • Own solutioning, developing and implementing solutions for the data cataloging, metadata and governance team.
  • Lead the curation and ongoing management of the enterprise data catalog by capturing, validating, and enriching metadata from diverse sources, ensuring that business terms, data elements, and approved definitions are documented in collaboration with Data Owners and SME’s.
  • Monitor catalog adoption and usage, continuously enhancing catalog usability and searchability so that all critical datasets, data products, and master data entities are indexed, discoverable, and accurately described.
  • Implement rigorous data quality assessments, applying validation and enrichment techniques to maintain the reliability, accuracy, and contextualization of metadata throughout the data lifecycle.
  • Develop and monitor KPIs for metadata quality, completeness, and compliance across domains.
  • Works closely with cross-functional teams—including Knowledge Management, Data Products, and other groups—to integrate catalog automation and metadata capabilities into broader enterprise workflows, supporting seamless data accessibility and governance.
  • Partner with the DSDH teams to implement automated data governance monitoring and reporting processes.
  • Contribute to proof-of-concept/pilot/launch projects that assess data governance and metadata improvements and quantify business value achieved through enhanced data governance.
  • Partner with the DSDH teams to implement automated data governance solutions, monitoring and reporting processes.
  • Participate with ontologies and knowledge graph initiatives to ensure metadata is harmonized with enterprise semantic frameworks.
  • Collaborate with the JJ Technology, legal, Compliance, external vendors and other DSDH teams to ensure alignment and traceability between business definitions, technical metadata, and lineage.
  • Design, develop, and deploy generative AI solutions that are integrated with cataloging workflows, further improving the discoverability, accessibility, and overall effectiveness of the data
  • Establish and configure integrated connections between multiple cataloging platforms to enable seamless data synchronization and automated metadata updates, reinforcing traceability and discoverability across systems.

Benefits

  • medical
  • dental
  • vision
  • life insurance
  • short- and long-term disability
  • business accident insurance
  • group legal insurance
  • consolidated retirement plan (pension)
  • savings plan (401(k))
  • long-term incentive program
  • Vacation –120 hours per calendar year
  • Sick time - 40 hours per calendar year; for employees who reside in the State of Washington –56 hours per calendar year
  • Holiday pay, including Floating Holidays –13 days per calendar year
  • Work, Personal and Family Time - up to 40 hours per calendar year
  • Parental Leave – 480 hours within one year of the birth/adoption/foster care of a child
  • Condolence Leave – 30 days for an immediate family member: 5 days for an extended family member
  • Caregiver Leave – 10 days
  • Volunteer Leave – 4 days
  • Military Spouse Time-Off – 80 hours
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service