About The Position

TetraScience is the Scientific Data and AI Company building Tetra OS, the operating system for scientific intelligence. They help leading life sciences firms transform fragmented scientific data into AI-native assets and scientific workflows to accelerate discovery, development, and manufacturing. TetraScience has a growing ecosystem of strategic partners including NVIDIA, Databricks, Thermo Fisher Scientific, Snowflake, Google, and Microsoft. Candidates are encouraged to review “The Tetra Way” by CEO Patrick Grady to understand the company's mission, culture, and expectations. The role involves building a search platform to help scientists find answers across billions of data points, from chemical structures and assay results to unstructured lab documents and instrument data. This Lead/Principal Platform Engineer will own the full search stack, including indexing and scoring, query understanding and rewriting, retrieval pipelines, and the underlying infrastructure. The position requires applying state-of-the-art classical search methods, custom analyzers, index design, and newer semantic and hybrid retrieval techniques. The engineer will go beyond out-of-the-box OpenSearch to develop custom ranking logic, relevance tuning, and scoring models for massive, heterogeneous scientific datasets. This is a hands-on technical leadership role, involving coding, system architecture, mentoring engineers, and shaping the roadmap for search capabilities. The role often involves translating ambiguous scientific workflows into well-architected search systems with evolving requirements. Daily collaboration with Applied AI Scientists, platform engineers, and product teams is essential to deliver high-performance search services that drive discovery, analysis, and decision-making in bio-pharma R&D. The domain is bio-pharma R&D, dealing with data types like molecular structures (SMILES), experimental datasets, and knowledge graphs. While cheminformatics knowledge is not required initially, an excitement to apply deep search expertise to novel and complex data types is expected.

Requirements

  • 10+ years of backend or platform engineering experience building distributed, production grade systems.
  • Hands-on experience with search technologies such as Elasticsearch/OpenSearch, Lucene, or vector databases not just deployment, but custom configuration, relevance tuning, and performance optimization at scale.
  • Strong understanding of semantic and hybrid retrieval: embeddings, transformer models, vector similarity, ranking logic, relevance tuning, and how to blend them with classical keyword search.
  • Expert-level coding skills in TypeScript and Python building robust APIs and backend services.
  • Proven ability to build and operate search infrastructure on cloud platforms (AWS preferred), including containerization, CI/CD, observability, and capacity planning.
  • Familiarity with scientific or unstructured data processing, such as documents, tables, analytical results, or experimental datasets.
  • Excellent communication and collaboration skills comfortable working alongside scientists, AI researchers, and product teams.
  • Exposure to NLP, LLMs, embedding generation, or retrieval-augmented workflows.
  • Experience with vector databases / embeddings stores (e.g., OpenSearch) to support semantic search and RAG.
  • Strong problem solving skills, while being Comfortable navigating ambiguity translating loosely defined scientific workflows and user needs into well-engineered search systems.

Nice To Haves

  • Contributions to open-source search projects (Apache Lucene, Solr, OpenSearch, or similar) or active involvement in the search engineering community.
  • Experience with cheminformatics tools and libraries (e.g., RDKit), including molecular fingerprints, similarity metrics, or substructure search.
  • Prior experience implementing chemical search systems, such as SMILES parsing, normalization, or chemical indexing.
  • Experience with entity resolution, knowledge graphs, or NLP pipelines that enrich search corpora.
  • Experience with large-scale data platforms such as Databricks, Lakehouse architectures, or distributed indexing systems.

Responsibilities

  • Architect a full-stack Search Platform across all layers of indexing and scoring, query understanding, rewriting and federation, and extensible search experiences.
  • Continuously improve search quality through evaluation metrics such as precision@K, recall@K, MRR, and relevance testing with real scientific use cases.
  • Engineer sophisticated hybrid search pipelines that blend sparse (keyword), structured (metadata), and dense (vector) retrieval. You will go beyond out-of-the-box OpenSearch to design custom ranking logic, reciprocal rank fusion, and relevance tuning that surfaces the exact "needle in the haystack" for drug discovery.
  • Lead by example and write code, review designs, and set the standard for engineering quality on the Search Platform team. Mentor engineers and help grow the team's search and distributed systems expertise.
  • Contribute to architectural decisions, technical strategy, and platform-wide improvements to accelerate scientific insight generation.
  • Own and operate the Search Platform infrastructure, ensuring high availability, scalability, performance, and observability across indexing, embedding generation, and query execution.
  • Develop and maintain backend services and APIs in Python and TypeScript that power search capabilities for scientists, data engineers, and AI applications.
  • Ensure security, compliance, and tenant isolation as part of operating search services in enterprise bio-pharma environments.
  • Collaborate with Applied AI Scientists to integrate embeddings, transformer models, and chemical fingerprints into production search workflows.
  • Architect and implement scientific entity resolution and knowledge graph pipelines to transform raw text into interconnected knowledge. You will design systems that extract and link chemical and biological entities (NER/NED) from unstructured documents, enabling the search engine to "understand" relationships between compounds, targets, and assays.

Benefits

  • 100% employer-paid benefits for all eligible employees and immediate family members
  • Unlimited paid time off (PTO)
  • 401K
  • Flexible working arrangements - Remote work
  • Company paid Life Insurance, LTD/STD
  • A culture of continuous improvement where you can grow your career and get coaching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service