We are expanding our efforts into complementary data technologies for analytics and decision support in areas of ingesting and processing large data sets. Our interests are in enabling data science and search based applications on large and low latent data sets in both a batch and streaming context for processing. To that end, this role will engage with team counterparts in exploring, developing and deploying technologies for creating data sets using a combination of batch and streaming transformation processes. These data sets support both off-line and in-line machine learning training and model execution. Other data sets support search engine based analytics. Exploration and deployment of technologies activities include identifying opportunities that impact business strategy, selecting data solutions software, and defining hardware requirements based on business requirements. Responsibility also includes coding, testing, and documentation of new or modified scalable analytic data systems including automation for deployment and monitoring. This role participates along with team counterparts to architect an end-to-end framework developed on a group of core data technologies. Other aspects of the role include developing standards and processes for data engineering projects and initiatives. Evaluate, research, experiment with data engineering technologies in a lab to keep pace with industry innovation while assessing business impact and viability for use cases associated with efforts in hand Work with data engineering related groups to inform on and showcase capabilities of emerging technologies and to enable the adoption of these new technologies and associated techniques Define and refine processes and procedures for the data engineering practice Work closely with data scientists, data architects, ETL developers, other IT counterparts, and business partners to identify, capture, collect, and format data from the external sources, internal systems, and the data warehouse to extract features of interest Code, test, deploy, monitor, document, and troubleshoot data engineering processing and associated automation Define data engineering architecture both hardware and software reflective of business requirements to be included in end-to-end solution architecture Educate and develop ETL developers on data engineering so as to enable transition to data engineer and practice Conduct code reviews, suggest improvements, support technology upgrades for the common libraries, handover them to the corresponding development teams for quality check and support them till deployment into production Support ETL developers and Operations teams to troubleshooting of the incidents for root cause analysis and assist in solutioning to meet the service level agreements Work with Operations teams in Big Data, IT and Information Security with monitoring and troubleshooting of incidents to maintain service levels Contribute to the evolving distributed systems architecture to meet changing requirements for scaling, reliability, performance, manageability, and cost Report utilization and performance metrics to user communities Contributes to planning and implementation of new/upgraded hardware and software releases Responsible for monitoring the Linux, Hadoop, and Spark communities and vendors and report on important defects, feature changes, and or enhancements to the team Research and recommend innovative, and where possible, automated approaches for administration tasks Identify approaches to efficiencies in resource utilization, provide economies of scale, and simplify support issues
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Manager
Number of Employees
1,001-5,000 employees