Sr. Software Engineer

CDK Global•Austin, TX

5h•Remote

About The Position

Design and develop EMR pipelines by using AWS services like SQS QUEUE, EC2 instances, AWS data pipeline, S3 buckets, AWS glue, RDS and others. Create extract-transform-load (ETL) EMR pipelines based on HADOOP, hive, Yarn resource manager, NIFI, spark and python frameworks in AWS. Interpret the data mapping document to identify the source systems like SQL server and develop required spark transformations for ingesting the ETL data into titan platform. Optimize spark jobs using Pyspark after complete analysis of multiple parameters and opportunities to improve the target systems along with data quality checks. Create spark data frames/RDD's and load the data in different formats JSON, Parquet, AVRO, CSV and others. Evaluate new architectures and technologies such as snowflake, Debezium and other tools to improve performance and efficiency of ETL tasks. Responsible for completing the data requests from CDK product customers and help them debug and resolve any data quality issues in a timely manner. Work closely with the CDK customers and provide the feedback to the CDK service team to improve reliability of our products. Work in the scaled agile methodologies to increase the quality of the deliverables. Monitor and resolve production 11/12 issues. 100% Telecommuting.

Requirements

Bachelor’s degree or foreign equivalent in Computer Science, Information Technology, Computer Engineering or a related field plus 5 years of professional experience as a Software Developer or related occupation.
Alternatively, a Master’s degree or foreign equivalent in Computer Science, Information Technology, Computer Engineering or a related field plus 3 years of professional experience as a Software Developer or related occupation.
Employment experience with: Designing, developing, and migrating data pipelines to latest data frameworks such as Databricks.
Employment experience with: Creating Extract-Transform-Load (ETL) pipelines based on Hadoop, Hive, Nifi, Spark and Python frameworks in cloud AWS/Snowflake.
Employment experience with: Map data between source systems and data lake and develop required data transformations for ingesting the source data into data lake (TITAN Platform).
Employment experience with: Creating spark data frames/RDD's and load the data in different formats such as Json, Parquet, Avro, and CSV.
Employment experience with: Evaluating new architectures and technologies such as Snowflake, Debezium, Kafka and other tools to improve performance and efficiency of ETL tasks.

Responsibilities

Design and develop EMR pipelines using AWS services (SQS QUEUE, EC2 instances, AWS data pipeline, S3 buckets, AWS glue, RDS).
Create ETL EMR pipelines based on HADOOP, hive, Yarn resource manager, NIFI, spark and python frameworks in AWS.
Interpret data mapping documents to identify source systems (e.g., SQL server) and develop spark transformations for ingesting ETL data into the titan platform.
Optimize spark jobs using Pyspark, analyzing parameters and opportunities to improve target systems and data quality.
Create spark data frames/RDD's and load data in various formats (JSON, Parquet, AVRO, CSV).
Evaluate new architectures and technologies (e.g., snowflake, Debezium) to enhance ETL performance and efficiency.
Complete data requests from CDK product customers, debug, and resolve data quality issues.
Collaborate with CDK customers and provide feedback to the CDK service team to improve product reliability.
Work within scaled agile methodologies to enhance deliverable quality.
Monitor and resolve production issues.