Data Engineer [Multiple Positions Available]

JPMorganChase•Plano, TX

3d•Onsite

About The Position

DESCRIPTION: Duties: Perform solution architecture, and design and develop data ingestion processes for Machine Learning pipelines. Evaluate new and current technologies using emerging model feature engineering standards and frameworks. Provide technical guidance and direction to support the business and its technical teams, contractors, and vendors. Contribute to the engineering community as an advocate of firm-wide data frameworks, tools, and practices in the AI and ML Development Life Cycle. Influence peers and project decision-makers to consider the use and application of leading-edge technologies. Apply advanced analytics techniques to identify, analyze, and interpret trends or patterns in complex data sets enabling superior machine learning model outcomes. Innovate new ways of managing, transforming, and validating Machine learning model outputs. Establish and enforce guidelines to ensure consistency, quality, and completeness of Machine learning feature data assets. Act as the coach and mentor to team members on their assigned project tasks. Develop a cohesive MLOps and DataOps pipeline to ensure scalability, reliability and resiliency. Conduct product work reviews with team members. QUALIFICATIONS: Minimum education and experience required: Bachelor's degree in Electronic Engineering, Computer Engineering, Computer Science or related field of study plus 7 years of experience in the job offered or as Data Engineer, IT Project Architect, IT Consultant, Application Developer, Software Engineer, or related occupation. Skills Required: This position requires seven (7) years of experience with the following: utilizing Data Lake and Delta Lake Management Architecture for AI and ML enablement; designing and implementing data lake management architecture for AI-driven solutions, including both traditional Data Lakes and Delta Lakes for optimized data storage and processing; technology, big data analysis, and ML features domain consulting; analyzing, designing, and conducting proof of concepts (POC) to validate architectural decisions and data strategies; delivering incremental solutions using an Agile approach, ensuring continuous integration and delivery; implementing transformations on big data platforms, Python, PySpark and Scala programming languages, including NoSQL databases, Teradata, DB2, Hadoop, Snowflake and SAS BI tools with a focus on leveraging Delta Lake for ACID transactions and scalable data processing. This position requires five (5) years of experience with the following: utilizing Databricks and AWS and Azure data processing tools to support ML model training; utilizing data transformation tools including AWS Glue, EMR, EKS, Redshift, MSK (Managed Streaming for Apache Kafka), AWS Kinesis, and Databricks for collaborative data engineering and machine learning workflows; handling terabyte- sized datasets with multi-threading in PySpark on cloud platforms, utilizing Databricks for enhanced performance and scalability; utilizing cloud computing platforms including Azure or AWS, integrating Databricks for seamless data processing and analytics. This position requires three (3) years of experience with the following: using event-driven architecture (EDA) and real-time streaming to identify fraud proactively; utilizing event-driven architecture using event streaming with Apache Kafka and AWS MSK for real-time feature engineering; developing end-to- end pipelines using Python and PySpark to support Data Lake, Data warehouse and ML models, leveraging Databricks for model training and deployment. This position requires one (1) year of experience with the following: applying data exploration techniques to analyze customer behavior to find actionable domain specific insights utilizing algorithms to explore large collection of customer transactions and reveal hidden relationships among entities, ensuring comprehensive data insights; maintaining governance, reproducibility, and scalability of models, while optimizing workflows for efficiency. This position requires any amount of experience with the following: utilizing AWS Kinesis for real-time data streaming and processing, to ensure low-latency and high-throughput data pipelines.

Requirements

Bachelor's degree in Electronic Engineering, Computer Engineering, Computer Science or related field of study plus 7 years of experience in the job offered or as Data Engineer, IT Project Architect, IT Consultant, Application Developer, Software Engineer, or related occupation.
7 years of experience utilizing Data Lake and Delta Lake Management Architecture for AI and ML enablement.
7 years of experience designing and implementing data lake management architecture for AI-driven solutions, including both traditional Data Lakes and Delta Lakes for optimized data storage and processing.
7 years of experience in technology, big data analysis, and ML features domain consulting.
7 years of experience analyzing, designing, and conducting proof of concepts (POC) to validate architectural decisions and data strategies.
7 years of experience delivering incremental solutions using an Agile approach, ensuring continuous integration and delivery.
7 years of experience implementing transformations on big data platforms, Python, PySpark and Scala programming languages, including NoSQL databases, Teradata, DB2, Hadoop, Snowflake and SAS BI tools with a focus on leveraging Delta Lake for ACID transactions and scalable data processing.
5 years of experience utilizing Databricks and AWS and Azure data processing tools to support ML model training.
5 years of experience utilizing data transformation tools including AWS Glue, EMR, EKS, Redshift, MSK (Managed Streaming for Apache Kafka), AWS Kinesis, and Databricks for collaborative data engineering and machine learning workflows.
5 years of experience handling terabyte- sized datasets with multi-threading in PySpark on cloud platforms, utilizing Databricks for enhanced performance and scalability.
5 years of experience utilizing cloud computing platforms including Azure or AWS, integrating Databricks for seamless data processing and analytics.
3 years of experience using event-driven architecture (EDA) and real-time streaming to identify fraud proactively.
3 years of experience utilizing event-driven architecture using event streaming with Apache Kafka and AWS MSK for real-time feature engineering.
3 years of experience developing end-to- end pipelines using Python and PySpark to support Data Lake, Data warehouse and ML models, leveraging Databricks for model training and deployment.
1 year of experience applying data exploration techniques to analyze customer behavior to find actionable domain specific insights utilizing algorithms to explore large collection of customer transactions and reveal hidden relationships among entities, ensuring comprehensive data insights.
1 year of experience maintaining governance, reproducibility, and scalability of models, while optimizing workflows for efficiency.
Experience utilizing AWS Kinesis for real-time data streaming and processing, to ensure low-latency and high-throughput data pipelines.

Responsibilities

Perform solution architecture, and design and develop data ingestion processes for Machine Learning pipelines.
Evaluate new and current technologies using emerging model feature engineering standards and frameworks.
Provide technical guidance and direction to support the business and its technical teams, contractors, and vendors.
Contribute to the engineering community as an advocate of firm-wide data frameworks, tools, and practices in the AI and ML Development Life Cycle.
Influence peers and project decision-makers to consider the use and application of leading-edge technologies.
Apply advanced analytics techniques to identify, analyze, and interpret trends or patterns in complex data sets enabling superior machine learning model outcomes.
Innovate new ways of managing, transforming, and validating Machine learning model outputs.
Establish and enforce guidelines to ensure consistency, quality, and completeness of Machine learning feature data assets.
Act as the coach and mentor to team members on their assigned project tasks.
Develop a cohesive MLOps and DataOps pipeline to ensure scalability, reliability and resiliency.
Conduct product work reviews with team members.