Data Solutions Engineer

CitiIrving, TX
1d

About The Position

Serve as an integral team member of our Data Engineering team, responsible for the design and development of Big Data solutions. Partner with domain experts, product managers, analysts, and data scientists to develop robust Big Data pipelines in Hadoop or Snowflake environments. Responsible for delivering a data-as-a-service framework. Responsible for moving all legacy workloads to cloud platform. Lead the migration of all legacy workloads to cloud platforms. Engage with key stakeholders to elicit and document requirements, including detailed data flow specifications. Assess appropriate solutions and collaborate with relevant teams to drive optimal implementations. Work with data scientists to build client pipelines using heterogeneous sources and provide essential engineering services for data science applications. Research and evaluate open-source technologies and components, recommending and integrating them into design and implementation efforts. Act as a technical expert, mentoring other team members on Big Data and Cloud technology stacks. Define comprehensive requirements for maintainability, testability, performance, security, quality, and usability across the data platform. Drive the implementation of consistent patterns, reusable components, and coding standards for all data engineering processes. Convert SAS-based pipelines into modern languages like PySpark and Scala for execution on Hadoop and non-Hadoop ecosystems. Optimize Big Data applications on both Hadoop and non-Hadoop platforms for peak performance. Evaluate new IT developments and evolving business requirements, recommending appropriate system alternatives and/or enhancements to current systems through analysis of business processes, systems, and industry standards. This includes driving compliance with applicable laws, rules, and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct, and business practices, and escalating, managing, and reporting control issues with transparency.

Requirements

  • 5+ years of experience with Hadoop and Big Data technologies
  • Demonstrated proficiency in Python, PySpark, and Scala, including practical experience with fundamental machine learning libraries
  • Experience in developing robust data solutions leveraging Google Cloud or AWS platforms; relevant certifications are preferred
  • Experience with SAS
  • Experience with containerization and related technologies (e.g., Docker, Kubernetes)
  • Comprehensive understanding of software engineering and data analytics
  • In-depth knowledge and hands-on experience with the Hadoop ecosystem and Big Data technologies (e.g., HDFS, MapReduce, Hive, Pig, Impala, Kafka, Kudu, Solr)
  • Knowledge of Agile (Scrum) development methodologies
  • Strong development and automation skills
  • System-level understanding of data structures, algorithms, distributed storage, and compute
  • A proactive approach to solving complex business problems, complemented by strong interpersonal and teamwork skills
  • Familiarity with Hadoop administration and Snowflake
  • Proficiency in Java or additional experience with Apache Beam
  • Bachelor's degree/University degree or equivalent experience

Responsibilities

  • Design and development of Big Data solutions
  • Partner with domain experts, product managers, analysts, and data scientists to develop robust Big Data pipelines in Hadoop or Snowflake environments
  • Deliver a data-as-a-service framework
  • Move all legacy workloads to cloud platform
  • Lead the migration of all legacy workloads to cloud platforms
  • Engage with key stakeholders to elicit and document requirements, including detailed data flow specifications
  • Assess appropriate solutions and collaborate with relevant teams to drive optimal implementations
  • Work with data scientists to build client pipelines using heterogeneous sources and provide essential engineering services for data science applications
  • Research and evaluate open-source technologies and components, recommending and integrating them into design and implementation efforts
  • Act as a technical expert, mentoring other team members on Big Data and Cloud technology stacks
  • Define comprehensive requirements for maintainability, testability, performance, security, quality, and usability across the data platform
  • Drive the implementation of consistent patterns, reusable components, and coding standards for all data engineering processes
  • Convert SAS-based pipelines into modern languages like PySpark and Scala for execution on Hadoop and non-Hadoop ecosystems
  • Optimize Big Data applications on both Hadoop and non-Hadoop platforms for peak performance
  • Evaluate new IT developments and evolving business requirements, recommending appropriate system alternatives and/or enhancements to current systems through analysis of business processes, systems, and industry standards
  • Driving compliance with applicable laws, rules, and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct, and business practices, and escalating, managing, and reporting control issues with transparency
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service