Data Architect (AWS & Python FastAPI)

DataArtBelgrade, MT
62d

About The Position

Our client is a leading legal recruiting company focused on building a cutting-edge data-driven platform for lawyers and law firms. The platform consolidates news and analytics, real-time deal and case tracking from multiple sources, firm and lawyer profiles with cross-linked insights, rankings, and more - all in one unified place. We are seeking a skilled Data Architect with strong expertise in AWS technologies (EMR, SageMaker) and Python (FastAPI) to lead the design and implementation of the platform's data architecture. This role involves defining data models, building ingestion pipelines, applying AI-driven entity resolution, and managing scalable, cost-effective infrastructure aligned with cloud best practices.

Requirements

  • Proven experience as a Data Architect or Senior Data Engineer working extensively with AWS services, especially EMR and SageMaker.
  • Strong proficiency in Python development, preferably with FastAPI or similar modern frameworks.
  • Deep understanding of data modeling principles, entity resolution, and schema design for complex data systems.
  • Hands-on experience designing and managing scalable data pipelines, workflows, and AI-driven data processing.
  • Familiarity with relational databases such as PostgreSQL.
  • Strong knowledge of cloud infrastructure cost optimization and performance tuning.
  • Excellent problem-solving skills and ability to work in a collaborative, agile environment.

Nice To Haves

  • Experience within legal tech or recruiting data domains.
  • Familiarity with Content Management Systems (CMS) for managing data sources.
  • Knowledge of data privacy, security regulations, and compliance standards.

Responsibilities

  • Define entities, relationships, and persistent IDs; enforce the Fact schema with confidence scores, timestamps, validation status, and source metadata.
  • Blueprint ingestion workflows from law firm site feeds; normalize data, extract entities, classify content, and route low-confidence items for review.
  • Develop a hybrid of deterministic rules and LLM-assisted matching; configure thresholds for auto-accept, manual review, or rejection.
  • Specify Ops Portal checkpoints, data queues, SLAs, and create a corrections/version history model.
  • Stage phased rollout of data sources-from ingestion through processing, storage, replication, to management via CMS.
  • Align architecture with AWS and Postgres baselines; design for scalability, appropriate storage tiers, and cost-effective compute and queuing solutions.
  • Utilize AWS services such as EMR for big data processing and SageMaker for AI/ML workflows.
  • Develop robust backend APIs using Python FastAPI for data services and platform integrations.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Publishing Industries

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service