Data Engineer

IBRRockledge, FL
17h$120,000 - $145,000Remote

About The Position

At Imagine Believe Realize, LLC we are driven by innovation, transformation and a relentless pursuit of excellence. As an industry leader delivering impactful results, we thrive on solving complex technical challenges and developing cutting-edge solutions that empower our customers and advance critical missions. IBR is a fast-growing company fueled by passion, curiosity, and innovative thinking – where every team member has the opportunity to continuously learn, unlock their full potential, and redefine what is possible in engineering and technology. If you are inspired by innovation, eager to make a difference, and ready to bring your creativity and expertise to a mission-focused team, we invite you to join us and together, we will shape the future. Let’s Make It Real! The Data Engineer must be able to meet the key criteria below: Location: 100% remote Years' Experience: 5+ years professional data engineering experience Education: Bachelors in IT related field Security Clearance: Must currently hold an active Secret security clearance as mandated by the government program. Employment Type: Full Time Work Schedules: IBR promotes work-life balance by offering flexible scheduling options. Standard business hours are aligned to the Eastern Time Zone. Key Skills: 5+ years of IT experience focusing on enterprise data architecture and management to include data flow charts, diagrams, and other technical documentation. Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required Python development experience required Experience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration Services, and the ability to incorporate Python as required. Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization) Proficiency using Git for version control, including repository management, branching, merging, and pull requests Active CompTIA Security+ certification preferred. If selected, must be able to obtain a CompTIA Security+ certification prior to begin supporting the program. Overview Do you want to help build a portfolio of next-generation data collection systems and enterprise portals? As a Data Engineer at IBR, you will support the Agile based engineering of robust, secure, and scalable enterprise web portal solutions hosted in AWS. This position will work closely with the solutions delivery team to support the operations team performing Deployment, Systems Integration Testing, and Operations & Maintenance activities.

Requirements

  • 5+ years of IT experience focusing on enterprise data architecture and management
  • Must have an active Secret security clearance
  • Bachelor degree required
  • Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling
  • Experience with Databricks and Python Development, Structured Streaming, Delta Lake concepts, and Delta Live Tables required
  • Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark
  • Data Lake concepts such as time travel and schema evolution and optimization
  • Structured Streaming and Delta Live Tables with Databricks a bonus
  • Knowledge of Python (Python 3.X) for CI/CD pipelines required
  • Familiarity with Pytest and Unittest a bonus
  • Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support
  • Advanced level understanding of streaming data pipelines and how they differ from batch systems
  • Formalize concepts of how to handle late data, defining windows, and data freshness
  • Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc
  • Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc.
  • Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus
  • Understanding of streaming data pipelines and batch systems
  • Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness
  • Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)
  • Indexing and partitioning strategy experience
  • Debug, troubleshoot, design and implement solutions to complex technical issues
  • Experience with large-scale, high-performance enterprise big data application deployment and solution
  • Understanding how to create DAGs to define workflows
  • Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required
  • Architecture experience in AWS environment a bonus
  • Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus
  • Experience with Docker, Jenkins, and CloudWatch
  • Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines
  • Experience working with AWS Lambdas for configuration and optimization
  • Experience working with DynamoDB to query and write data
  • Experience with S3
  • Experience working with JSON and defining JSON Schemas a bonus
  • Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus
  • Familiarity with Schema Registry, message formats such as Avro, ORC, etc.
  • Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams
  • Ability to thrive in a team-based environment
  • Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management
  • Proficiency using Git for version control, including repository management, branching, merging, and pull requests
  • Repository setup and management
  • Branching strategies (feature, develop, main)
  • Merging and resolving conflicts
  • Creating and reviewing pull requests
  • Commit best practices (clear messages, atomic commits)
  • Tagging and release management

Nice To Haves

  • Active CompTIA Security+ certification preferred. If selected, must be able to obtain a CompTIA Security+ certification prior to begin supporting the program.
  • Structured Streaming and Delta Live Tables with Databricks a bonus
  • Familiarity with Pytest and Unittest a bonus
  • Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus
  • Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required
  • Architecture experience in AWS environment a bonus
  • Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus
  • Experience working with JSON and defining JSON Schemas a bonus
  • Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus

Responsibilities

  • Plan, create, and maintain data architectures, ensuring alignment with business requirements
  • Obtain data, formulate dataset processes, and store optimized data
  • Identify problems and inefficiencies and apply solutions
  • Determine tasks where manual participation can be eliminated with automation.
  • Identify and optimize data bottlenecks, leveraging automation where possible
  • Create and manage data lifecycle policies (retention, backups/restore, etc)
  • In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines
  • Create, maintain, and manage data transformations
  • Maintain/update documentation
  • Create, maintain, and manage data pipeline schedules
  • Monitor data pipelines
  • Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality
  • Support AI/ML teams with optimizing feature engineering code
  • Expertise in Spark/Python/Databricks, Data Lake and SQL
  • Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT
  • Research existing data in the data lake to determine best sources for data
  • Create, manage, and maintain ksqlDB and Kafka Streams queries/code
  • Data driven testing for data quality
  • Maintain and update Python-based data processing scripts executed on AWS Lambdas
  • Unit tests for all the Spark, Python data processing and Lambda codes
  • Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc)
  • Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness.

Benefits

  • Nationwide medical, dental, and vision insurance
  • 3 weeks of Paid Time Off and 11 Paid Federal Holidays
  • 401k matching
  • Life Insurance, Short-Term Disability, and Long-Term Disability at no cost to our employees
  • Supplemental insurance options
  • Flexible spending accounts and Dependent Care spending accounts
  • Wellness incentives
  • Reimbursement for professional development and certifications
  • Access to training assistance opportunities to support career growth and progression
  • Hybrid and Remote work opportunities to support work-life balance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service