Big Data Support Engineer Lead - Vice President

Citi•Irving, TX

59d

About The Position

Developed communication and diplomacy skills are required in order to guide, influence and convince others, in particular colleagues in other areas and occasional external customers. Significant impact on the area through complex deliverables. ## At least 10+ years of hands on Overall IT experience of which 2 or more years in one or more of the Cloud technologies running services on Open Shift, AWS or Google Cloud. ## Has a good understanding of Data Engineering function and Role and tools and technologies used in one or more technologies including Ab Initio, Big Data, Master Data Management (MDM) and Hybrid cloud. ## Strong knowledge of using CICD tools for automated code deployments. ## Strong knowledge of SOAP Rest API's and Micro services. ## Knowledge of creating Observability Dashboards using Splunk, App Dynamics, ELK and Grafana. ## Working knowledge of Ansible scripts for Automation. ## Track record of successfully triaging issues and driving them to resolution. ## Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements ## Expectation for the role is to be available "on call" or "shift basis" for off hours Production support. ## Can handle multiple, competing priorities simultaneously ## Ability to work with Offshore and Onsite Production Support Teams across multiple organizations. ## Good knowledge of Disaster Recovery process across Data centers. ## Strong analytical skills, strong problem-solving skills and ability to logically break down tasks into smaller manageable parts. ## Strong individual with the ability to communicate and negotiate at all levels and ability to influence people. ## Effective meeting management, team management and organizational skills. ## Effective Presentation skills and creating Visual Presentations using Microsoft PowerPoint. ## Ability to interact with individuals at all organizational levels. ## Bachelor's or Master's degree in engineering or computer science. ## Prior Experience in Data Warehouse and Business Intelligence Applications. ------------------------------------------------------ Job Family Group: Technology ------------------------------------------------------ Job Family: Applications Support ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ ------------------------------------------------------ For complementary skills, please see above and/or contact the recruiter. ------------------------------------------------------ Anticipated Posting Close Date: dic 23, 2025 ------------------------------------------------------ ## Has a strong understanding and experience in leading all aspects of Incident Management, Problem Management, Service Improvements, Monitoring and Observability instrumentation, SRE(Site Reliability engineering) Frameworks and adoption, Disaster recovery and resiliency, and automation of production services. ## Leads the production monitoring, Implementation of Observability using AppD, Splunk, Grafana & strong knowledge of monitoring tools used in the industry. ## Collaborates with development team, Architecture teams and Infrastructure teams and leads service improvement plans. ## Supports the delivery of the L2 Service Delivery and SRE (Site reliability engineering) objectives for the business/region. ## Leads the team and contributes towards achievement of service performance against targets for the organization. ## Strong bias towards automation and using SRE Framework. ## Incident Management: Performs incident triage, root cause analysis, and collect and validate business impact. ## Service Management: Collaborates with Technology Organization and manages service risk/maturity assessments and drives the Service Improvement Plans. ## Knowledge Management: Develops/tests knowledge objects to support increased L0, L1, and L2 resolution. ## Change Management: Review and approve changes. ## Capacity Management: Review capacity across service components. ## Continuity Management: Schedule and facilitate COB testing, maintain recovery plans. ## Configuration Management: Build/update service configuration. ## Third Party Asset Management: Manage 3rd party asset management (licensing compliance/optimization) ## Service Readiness: This activity encompasses review of major releases & new application install from very early stage of the project/program, to ensure Risks are documented and remediated before production Go Live. ## Service Risks: Ability to identify, document and Manage Service Risks within Applications and effectively manage the resolutions of Risks. ## Monitoring: Collaborate and engage with various teams to enable monitoring / observability of production services.

Requirements

At least 10+ years of hands on Overall IT experience of which 2 or more years in one or more of the Cloud technologies running services on Open Shift, AWS or Google Cloud.
Has a good understanding of Data Engineering function and Role and tools and technologies used in one or more technologies including Ab Initio, Big Data, Master Data Management (MDM) and Hybrid cloud.
Strong knowledge of using CICD tools for automated code deployments.
Strong knowledge of SOAP Rest API's and Micro services.
Knowledge of creating Observability Dashboards using Splunk, App Dynamics, ELK and Grafana.
Working knowledge of Ansible scripts for Automation.
Track record of successfully triaging issues and driving them to resolution.
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Expectation for the role is to be available "on call" or "shift basis" for off hours Production support.
Can handle multiple, competing priorities simultaneously
Ability to work with Offshore and Onsite Production Support Teams across multiple organizations.
Good knowledge of Disaster Recovery process across Data centers.
Strong analytical skills, strong problem-solving skills and ability to logically break down tasks into smaller manageable parts.
Strong individual with the ability to communicate and negotiate at all levels and ability to influence people.
Effective meeting management, team management and organizational skills.
Effective Presentation skills and creating Visual Presentations using Microsoft PowerPoint.
Ability to interact with individuals at all organizational levels.
Bachelor's or Master's degree in engineering or computer science.
Prior Experience in Data Warehouse and Business Intelligence Applications.
Has a strong understanding and experience in leading all aspects of Incident Management, Problem Management, Service Improvements, Monitoring and Observability instrumentation, SRE(Site Reliability engineering) Frameworks and adoption, Disaster recovery and resiliency, and automation of production services.
Leads the production monitoring, Implementation of Observability using AppD, Splunk, Grafana & strong knowledge of monitoring tools used in the industry.
Collaborates with development team, Architecture teams and Infrastructure teams and leads service improvement plans.
Supports the delivery of the L2 Service Delivery and SRE (Site reliability engineering) objectives for the business/region.
Leads the team and contributes towards achievement of service performance against targets for the organization.
Strong bias towards automation and using SRE Framework.

Responsibilities

Incident Management: Performs incident triage, root cause analysis, and collect and validate business impact.
Service Management: Collaborates with Technology Organization and manages service risk/maturity assessments and drives the Service Improvement Plans.
Knowledge Management: Develops/tests knowledge objects to support increased L0, L1, and L2 resolution.
Change Management: Review and approve changes.
Capacity Management: Review capacity across service components.
Continuity Management: Schedule and facilitate COB testing, maintain recovery plans.
Configuration Management: Build/update service configuration.
Third Party Asset Management: Manage 3rd party asset management (licensing compliance/optimization)
Service Readiness: This activity encompasses review of major releases & new application install from very early stage of the project/program, to ensure Risks are documented and remediated before production Go Live.
Service Risks: Ability to identify, document and Manage Service Risks within Applications and effectively manage the resolutions of Risks.
Monitoring: Collaborate and engage with various teams to enable monitoring / observability of production services.