AI Platform & Cloud Engineer

AxleRockville, MD
4h

About The Position

Axle is a bioscience and information technology company that offers advancements in translational research, biomedical informatics, and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower decision-making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country including multiple institutes at the National Institutes of Health (NIH). Overview The AI Platform & Cloud Engineer will help sustain the hybrid cloud production environment for the SOM Center’s data ecosystem. This role serves as the technical interface between Data Science and IT, focusing on Platform Engineering: building the internal developer platform (IDP) that utilizes the IT-managed Kubernetes infrastructure and cloud resource to scale resources for workflow orchestration, knowledge graph data pipelines, and distributed model inference.

Requirements

  • Bachelor’s or master’s degree in computer science or engineering with experience in Cloud Engineering, MLOps, or SRE.
  • Proficiency in Python and Infrastructure as Code concepts, with experience in major cloud platforms (GCP preferred, or AWS).
  • AI Productivity: Demonstrated ability to leverage AI-driven coding assistants and LLMs to increase development velocity and code quality.
  • Experience utilizing Hybrid Cloud architectures and configuring workloads for burst computing (Spot instances, Autoscaling groups).
  • Experience refactoring research-grade code into production-grade services (Docker/Kubernetes).
  • Experience with Workflow Orchestration tools (Airflow, Prefect, or Dagster) and Vector Database administration.

Nice To Haves

  • Experience deploying applications to Kubernetes (GKE/EKS) and using GitOps workflows (ArgoCD/Flux).
  • Knowledge of Graph Database administration (Neo4j) and object storage architectures.
  • Familiarity with Serverless event processing (Cloud Functions) and ML Engineering concepts (quantization, distillation, serving via Triton/vLLM).

Responsibilities

  • IT Collaboration & K8s Support: Collaborate closely with the dedicated IT team to define compute requirements and orchestrate workloads on the new Kubernetes cluster. The engineer will not manage the cluster directly but will ensure data science applications are correctly containerized and configured to run efficiently on the infrastructure provided by IT.
  • Infrastructure Strategy: Define the Infrastructure as Code (IaC) specifications for application-level resources, working with IT to ensure on-premises GPU clusters and public cloud environments (GCP/AWS) are utilized effectively.
  • Refactoring & Model Serving: Transform experimental code (Jupyter Notebooks, R scripts) developed by NLP and Omics researchers into robust, containerized software packages. Deploy and optimize model inference servers (e.g., vLLM, Triton Inference Server) to expose AI models as reliable internal APIs.
  • Workflow Orchestration: Deploy and maintain the Workflow Orchestration platform (e.g., Apache Airflow, Prefect, or Dagster) to manage dependencies between data ingestion, model inference, and state updates, serving as the central execution controller for distributed processes.
  • AI-Assisted Development: Actively utilize AI-assisted coding tools (e.g., GitHub Copilot) to accelerate code generation, documentation, and refactoring processes to increase overall productivity.
  • Data Foundation: Administer the Data Foundation infrastructure, including supporting Graph Databases (e.g., Neo4j), Vector Databases (e.g., Milvus, pgvector) for RAG implementations, and ETL pipelines to ingest massive public datasets (e.g., Human Cell Atlas) into the Data Lake.
  • Cloud Agent Architecture: Architect and deploy managed Cloud AI Agents (e.g., via Vertex AI) to orchestrate complex reasoning workflows, including and not limited to parsing scientific literature, querying omics databases, and validating experimental protocols against Knowledge Graphs.
  • Security Implementation: Collaborate with data scientists to implement Workload Identity federation and secrets management (e.g., Vault), ensuring automated workflows securely authenticate against enterprise resources managed by IT.

Benefits

  • 100% Medical, Dental & Vision Coverage for Employees
  • Paid Time Off and Paid Holidays
  • 401K match up to 5%
  • Educational Benefits for Career Growth
  • Employee Referral Bonus
  • Flexible Spending Accounts:
  • Healthcare (FSA)
  • Parking Reimbursement Account (PRK)
  • Dependent Care Assistant Program (DCAP)
  • Transportation Reimbursement Account (TRN)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service