Data Solutions Engineer

Citi•Irving, TX

About The Position

Serve as an integral team member of our Data Engineering team, responsible for the design and development of Big Data solutions. Partner with domain experts, product managers, analysts, and data scientists to develop robust Big Data pipelines in Hadoop or Snowflake environments. Responsible for delivering a data-as-a-service framework. Responsible for moving all legacy workloads to cloud platform. Lead the migration of all legacy workloads to cloud platforms. Engage with key stakeholders to elicit and document requirements, including detailed data flow specifications. Assess appropriate solutions and collaborate with relevant teams to drive optimal implementations. Work with data scientists to build client pipelines using heterogeneous sources and provide essential engineering services for data science applications. Research and evaluate open-source technologies and components, recommending and integrating them into design and implementation efforts. Act as a technical expert, mentoring other team members on Big Data and Cloud technology stacks. Define comprehensive requirements for maintainability, testability, performance, security, quality, and usability across the data platform. Drive the implementation of consistent patterns, reusable components, and coding standards for all data engineering processes. Convert SAS-based pipelines into modern languages like PySpark and Scala for execution on Hadoop and non-Hadoop ecosystems. Optimize Big Data applications on both Hadoop and non-Hadoop platforms for peak performance. Evaluate new IT developments and evolving business requirements, recommending appropriate system alternatives and/or enhancements to current systems through analysis of business processes, systems, and industry standards. This includes driving compliance with applicable laws, rules, and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct, and business practices, and escalating, managing, and reporting control issues with transparency.

Requirements

5+ years of experience with Hadoop and Big Data technologies
Demonstrated proficiency in Python, PySpark, and Scala, including practical experience with fundamental machine learning libraries
Experience in developing robust data solutions leveraging Google Cloud or AWS platforms; relevant certifications are preferred
Experience with SAS
Experience with containerization and related technologies (e.g., Docker, Kubernetes)
Comprehensive understanding of software engineering and data analytics
In-depth knowledge and hands-on experience with the Hadoop ecosystem and Big Data technologies (e.g., HDFS, MapReduce, Hive, Pig, Impala, Kafka, Kudu, Solr)
Knowledge of Agile (Scrum) development methodologies
Strong development and automation skills
System-level understanding of data structures, algorithms, distributed storage, and compute
A proactive approach to solving complex business problems, complemented by strong interpersonal and teamwork skills
Familiarity with Hadoop administration and Snowflake
Proficiency in Java or additional experience with Apache Beam
Bachelor's degree/University degree or equivalent experience

Responsibilities

Design and development of Big Data solutions
Partner with domain experts, product managers, analysts, and data scientists to develop robust Big Data pipelines in Hadoop or Snowflake environments
Deliver a data-as-a-service framework
Move all legacy workloads to cloud platform
Lead the migration of all legacy workloads to cloud platforms
Engage with key stakeholders to elicit and document requirements, including detailed data flow specifications
Assess appropriate solutions and collaborate with relevant teams to drive optimal implementations
Work with data scientists to build client pipelines using heterogeneous sources and provide essential engineering services for data science applications
Research and evaluate open-source technologies and components, recommending and integrating them into design and implementation efforts
Act as a technical expert, mentoring other team members on Big Data and Cloud technology stacks
Define comprehensive requirements for maintainability, testability, performance, security, quality, and usability across the data platform
Drive the implementation of consistent patterns, reusable components, and coding standards for all data engineering processes
Convert SAS-based pipelines into modern languages like PySpark and Scala for execution on Hadoop and non-Hadoop ecosystems
Optimize Big Data applications on both Hadoop and non-Hadoop platforms for peak performance
Evaluate new IT developments and evolving business requirements, recommending appropriate system alternatives and/or enhancements to current systems through analysis of business processes, systems, and industry standards
Driving compliance with applicable laws, rules, and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct, and business practices, and escalating, managing, and reporting control issues with transparency

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Data Solutions Engineer

About The Position

Requirements

Responsibilities

What This Job Offers

Job Search Resources

Similar Data Solutions Engineer job opportunities

Tools

Career Hubs

Guides

Company