Pitisci & Associates-posted 9 days ago
Mid Level
Hybrid • Saint Petersburg, FL

Our client located in St. Petersburg, FL is seeking a Data Engineer to build and maintain data pipelines that connect Oracle-based source systems to AWS cloud environments to provide well-structured data for analysis and machine learning in AWS SageMaker. It includes working closely with data scientists to deliver scalable data workflows as a foundation for predictive modeling and analytics.

  • Develop and maintain data pipelines to extract, transform, and load data from Oracle databases and other systems into AWS environments (S3, Redshift, Glue, etc.).
  • Collaborate with data scientists to ensure data is prepared, cleaned, and optimized for SageMaker-based machine learning workloads.
  • Implement and manage data ingestion frameworks, including batch and streaming pipelines.
  • Automate and schedule data workflows using AWS Glue, Step Functions, or Airflow.
  • Develop and maintain data models, schemas, and cataloging processes for discoverability and consistency.
  • Optimize data processes for performance and cost efficiency.
  • Implement data quality checks, validation, and governance standards.
  • Work with DevOps and security teams.
  • Strong proficiency with SQL and hands-on experience working with Oracle databases.
  • Strong experience in batch processing.
  • Experience designing and implementing ETL/ELT pipelines and data workflows.
  • Hands-on experience with AWS data services, such as S3, Glue, Redshift, Lambda, and IAM.
  • Proficiency in Python for data engineering (pandas, boto3, pyodbc, etc.).
  • Solid understanding of data modeling, relational databases, and schema design.
  • Familiarity with version control, CI/CD, and automation practices.
  • Ability to collaborate with data scientists to align data structures with model and analytics requirements
  • B.S. in Computer Science, MIS or related degree and a minimum of five (5) years of related experience or combination of education, training and experience.
  • Experience integrating data for use in AWS SageMaker or other ML platforms.
  • Exposure to MLOps or ML pipeline orchestration.
  • Familiarity with data cataloging and governance tools (AWS Glue Catalog, Lake Formation).
  • Knowledge of data warehouse design patterns and best practices.
  • Experience with data orchestration tools (e.g., Apache Airflow, Step Functions).
  • Working knowledge of Java is a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service