Data Engineer - Intern

GreenGas USAHouston, TX
1dHybrid

About The Position

The Data Engineer Intern plays a supporting role in advancing the organization’s data-driven initiatives. In this role, the intern assists in building and maintaining elements of the data infrastructure that enable data to be collected, stored, processed, and made accessible for analysis, reporting, and machine learning. Working under the guidance of experienced engineers, the IT Data Engineer Intern gains hands-on experience with data pipelines, data platforms, and data integration processes while contributing to projects that improve the availability, quality, and usability of organizational data.

Requirements

  • Programming Languages: Python, Java, SQL and related technologies
  • Database Systems: Strong knowledge of relational databases (e.g., PostgreSQL, MySQL, SQL Server) and NoSQL databases (e.g., MongoDB, Cassandra).
  • Data Warehousing: Experience with data warehousing and platforms (e.g., Snowflake).
  • ETL Tools: Proficiency with various ETL tools such as snowflake and Databricks
  • Cloud Platforms: Experience with Azure Suite, Jira DevOps and Ticketing system.
  • Big Data Technologies: (Hadoop, Spark, Kafka, Hive).
  • Data Modeling and Schema Design: Ability to design efficient data models.
  • Scripting and Automation: For automating data processes.

Responsibilities

  • Design and Implement ETL/ELT Processes: Build robust Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) pipelines to move data from various sources (databases, APIs, streaming sources, external data providers) into data warehouses, data lakes, or other data repositories.
  • Develop and Maintain Scalable Data Pipelines: Create efficient and scalable data pipelines to handle large volumes of structured and unstructured data.
  • Automate Data Workflows: Data engineers will write code and scripts to automate repetitive tasks in data collection, processing, and delivery. Ie Databases to PBIs.
  • Monitor and Troubleshoot Pipelines: Continuously monitor pipeline performance, identify and resolve data-related issues, errors, and performance bottlenecks.
  • Design and Maintain Data Architecture: Selecting appropriate technologies for data storage such as relational databases, NoSQL databases, data lakes, data warehouses, cloud storage services) and designing efficient data models, schemas.
  • Optimize Data Storage and Retrieval: Work to ensure data is stored in a way that allows for high-performance queries and efficient analytical and operational use cases.
  • Evaluate and Implement Data Solutions: Research and implement new data technologies, tools, and frameworks to improve data infrastructure and processes.
  • Cloud Platform Management: Proficiency in Azure to deploy and manage data solutions in the cloud, including automation, databases and PowerBI’s.
  • Ensure Data Quality and Integrity: Implement validation rules, cleansing procedures, and monitoring systems to detect and rectify anomalies, ensuring data accuracy.
  • Implement Data Governance Frameworks: Define standards and policies for data usage, ensuring consistency, reliability, and compliance (e.g., GDPR, HIPAA).
  • Manage Data Security and Access: Implement security controls and access management policies to protect sensitive information from unauthorized access or theft.
  • Provide Data Access Tools: Set up dashboards, analytics tools, and API endpoints to make processed data accessible to end-users and applications.
  • Document Technical Designs and Workflows: Create comprehensive documentation for data pipelines, architecture, and processes to facilitate system transparency.
  • Support Data-Driven Decision Making: By providing clean, reliable, and accessible data, data engineers enable organizations to make informed business decisions.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service