This position is on the Personal Investor Data & Analytics - Data Reliability team that is responsible for data reliability (accurate, available, performant, and resilient) from the point of entry into the lake until consumption by the user (PI business units). This will include participation in the definition of best practices for observability, establishing and maintaining service level indicators (SLIs) and service level objectives (SLO), tracking and addressing toil, conducting blameless root cause post-mortems, and incorporating preventative and proactive Reliability practices, among other items. This individual will partner with Data Engineers, Data Analysts, and source Product Team Engineers to identify root causes, resolve issues, optimize existing systems, enhance infrastructure, and promote automation to reduce effort and increase reliability. Responsibilities: Proactively analyzes data pipeline & platform logs and metrics to identify trends and potential issues. Participates in special projects and performs other duties as assigned. Gain insights into PI Data & Analytics operations, demonstrates and champions Reliability culture and practices, builds relationships, and influences Reliability as a way of thought. Exhibits proficiency in data reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other Reliability best practices. Communicates progress, issues, trends, and solutions to management and partner organizations. Maintains proactive knowledge and understanding of pending elevations, enhancements, and infrastructure changes. Proactively identifies potential failure points and designs strategies to ensure that failures remain localized, preventing widespread disruption and contagion. Collaborates with internal teams to evaluate the health, stability, and reliability of systems/platforms. Collaborates with product teams in triage and troubleshooting during client impacting incidents. Participates in and/or facilitates post-incident reviews for any client-impacting events local to the Personal Investor Data & Analytics products. Maintains centralized incident response playbook, in collaboration with DRE Champions on each product team. Collaborates with DRE Champions and/or product team points of contacts to ensure adherence to the common operating model and standard development playbooks.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level