PathAI-posted 7 days ago
Mid Level
Remote • Boston, MA
101-250 employees

PathAI's mission is to improve patient outcomes with AI-powered pathology. Our platform promises substantial improvements to the accuracy of diagnosis and the efficacy of treatment of diseases like cancer, leveraging modern approaches in machine learning. Our team, comprising diverse employees with a wide range of backgrounds and experiences, is passionate about solving challenging problems and making a huge impact. We are seeking an experienced contract Back-End Developer with strong database / data warehouse skills to enhance the scalability, performance, and maintainability of our ML data infrastructure. The ideal candidate will bring strong expertise in server side Python development, relational databases, ETL/ELT, and modern big data deployments. You will work closely with our MLOps and ML engineering teams to optimize storage usage, modernize pipelines, deploy new technology, and/or build / enhance tools that support analytics and machine learning workflows. Contract Duration: Minimum 6 months Location: Remote (U.S.)

  • Analyze and optimize storage strategies for ML experiment data and metadata.
  • Design and implement intelligent retention and expiration for large-scale datasets.
  • Modernize and refactor ETL/ELT pipelines to improve scalability and ease of maintenance.
  • Create and populate additional schemas for validated and curated datasets.
  • Build or enhance database-backed applications supporting ML R&D and production analytics.
  • Collaborate with ML engineers, SREs, and platform teams.
  • Provide knowledge transfer for long-term maintainers.
  • Proficiency in Python for application development, data processing and automation.
  • Expertise with relational databases (e.g., Postgres, Amazon RDS, Aurora), including schema design, query optimization, and performance tuning.
  • Expertise with ELT pipelines (dbt preferred) and cloud data warehousing (Snowflake preferred)
  • Familiarity with big data deployments such as Spark and Hive.
  • Experience with Apache Airflow for systems automation.
  • Understanding of S3-based storage and large-scale data management strategies.
  • Ability to write clear technical documentation and collaborate effectively across teams.
  • Experience with query optimization, data partitioning strategies, and cost optimization in cloud environments
  • Background in machine learning data pipelines or analytics-heavy environments.
  • Knowledge of data governance, retention policies, or cost-optimization strategies in cloud environments.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service