About The Position

Design, develop, and maintain Extract/Transform/Load (ETL) workflows and large-scale data pipelines using Python, SQL (Structured Query Language), Hive, PySpark and BigQuery. Orchestrate, optimize, and deploy end-to-end data pipelines for ingesting, processing, and transforming large volumes of structured and unstructured data using Hadoop, Cloud Composer, Airflow DAGs, and DataProc clusters. Implement CI/CD pipelines using GitHub and Jenkins to automate deployments and ensure high availability of data pipelines. Develop and deploy classification and regression models, uplift modeling and predictive analysis to optimize marketing, sales and operational strategies. Utilize feature engineering, model validation and hyperparameter tuning techniques to enhance model accuracy and robustness. Implement A/B testing frameworks and causal inference methods to assess and refine data-driven decisions. Design and implement Large Language Model (LLM)-based solutions for automated customer interactions, intelligent search, and content generation. Build and optimize Retrieval-Augmented Generation (RAG) architectures leveraging Vector Databases and embedding models for domain-specific knowledge retrieval. Develop prompt engineering strategies, supervised fine-tuning methodologies, and Agentic AI workflows to create adaptive and autonomous AI-driven solutions. Use shell scripting and scheduling tools like Zeke and Airflow to manage job execution and monitoring. Develop data dictionaries and metadata repositories to document data lineage and improve accessibility. Work closely with cross-functional teams, including business analysts, data scientists, and IT teams, to ensure seamless integration of data solutions. Create and manage data models to build interactive Tableau and Power BI dashboards to provide key insights to business users and management. Position may work at various and unanticipated worksites throughout the United States. Telecommuting permitted.

Requirements

  • Python
  • SQL (Structured Query Language)
  • Hive
  • PySpark
  • BigQuery
  • Hadoop
  • Cloud Composer
  • Airflow DAGs
  • DataProc clusters
  • CI/CD pipelines
  • GitHub
  • Jenkins
  • Classification and regression models
  • Uplift modeling
  • Predictive analysis
  • Feature engineering
  • Model validation
  • Hyperparameter tuning techniques
  • A/B testing frameworks
  • Causal inference methods
  • Large Language Model (LLM)
  • Retrieval-Augmented Generation (RAG) architectures
  • Vector Databases
  • Embedding models
  • Prompt engineering strategies
  • Supervised fine-tuning methodologies
  • Agentic AI workflows
  • Shell scripting
  • Scheduling tools like Zeke and Airflow
  • Data dictionaries
  • Metadata repositories
  • Tableau
  • Power BI

Responsibilities

  • Design, develop, and maintain Extract/Transform/Load (ETL) workflows and large-scale data pipelines using Python, SQL (Structured Query Language), Hive, PySpark and BigQuery.
  • Orchestrate, optimize, and deploy end-to-end data pipelines for ingesting, processing, and transforming large volumes of structured and unstructured data using Hadoop, Cloud Composer, Airflow DAGs, and DataProc clusters.
  • Implement CI/CD pipelines using GitHub and Jenkins to automate deployments and ensure high availability of data pipelines.
  • Develop and deploy classification and regression models, uplift modeling and predictive analysis to optimize marketing, sales and operational strategies.
  • Utilize feature engineering, model validation and hyperparameter tuning techniques to enhance model accuracy and robustness.
  • Implement A/B testing frameworks and causal inference methods to assess and refine data-driven decisions.
  • Design and implement Large Language Model (LLM)-based solutions for automated customer interactions, intelligent search, and content generation.
  • Build and optimize Retrieval-Augmented Generation (RAG) architectures leveraging Vector Databases and embedding models for domain-specific knowledge retrieval.
  • Develop prompt engineering strategies, supervised fine-tuning methodologies, and Agentic AI workflows to create adaptive and autonomous AI-driven solutions.
  • Use shell scripting and scheduling tools like Zeke and Airflow to manage job execution and monitoring.
  • Develop data dictionaries and metadata repositories to document data lineage and improve accessibility.
  • Work closely with cross-functional teams, including business analysts, data scientists, and IT teams, to ensure seamless integration of data solutions.
  • Create and manage data models to build interactive Tableau and Power BI dashboards to provide key insights to business users and management.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service