Data Operations Analyst Sr

ERCOTAustin, TX
23h

About The Position

At ERCOT, our diverse and dynamic work environment provides a platform on which employees can work together to build the future of the Texas power grid and wholesale market utilizing the latest technologies and resources. We encourage you to join our talented, dedicated workforce to develop world-class solutions for today and tomorrow’s energy challenges while learning new skills and growing your career. ERCOT is committed to fostering inclusion at all levels of our company. It is the cornerstone of our corporate values of accountability, leadership, innovation, trust, and expertise. We know that individuals with a wide variety of talents, ideas, and experiences propel the innovation that drives our success. An inclusive and diverse workforce strengthens us and allows for a collaborative environment to solve the challenges that face our industry today and in the future. JOB SUMMARY Supports and ensures the reliability, performance, and integrity of enterprise data platforms across cloud (Azure Databricks, ADLS, Power BI) and on-premises systems (Oracle, Informatica, SAS, Cognos). Works as part of the Data Operations team to monitor data pipelines, troubleshoot incidents, support migrations and releases, and maintain platform stability. Progressively assumes responsibility for operational ownership, automation, observability, governance, optimization, and reliability engineering practices to support business-critical data workloads. Job Duties Level 2 & 3 Owns day-to-day operations of assigned data platforms or pipeline domains. Investigates and resolves moderately complex incidents with minimal supervision. Performs root cause analysis for recurring issues and documents corrective actions. Coordinates with infrastructure, security, and engineering teams during deployments and incident resolution. Supports migration activities, platform upgrades, and release validation. Implements monitoring enhancements and automated health checks. Participates in disaster recovery testing and data validation exercises. Optimizes SQL queries and pipeline performance under guidance. Assists in implementing cost-monitoring practices for cloud workloads. Communicates operational status and risks to stakeholders. Prioritizes workload to meet team service-level objectives. Level – Sr. (in additional to Level 2 &3) Ensures end-to-end reliability of data platforms, including cloud (Azure Databricks, ADLS) and on-premises systems (Oracle, Informatica, SAS, Cognos). Leads incident response and root-cause analysis for critical data operations issues, driving permanent fixes. Architects monitoring and observability frameworks for data pipelines, BI refreshes, and platform health, leveraging golden signals and automated checks. Designs and implements automation and infrastructure-as-code solutions to streamline operational workflows and environment provisioning. Governs data migrations, releases, and upgrades, ensuring integrity, rollback strategies, and compliance with change management standards. Optimizes data pipelines, queries, and storage strategies for performance, scalability, and cost efficiency across hybrid environments. Develops and validates disaster recovery and business continuity plans for data platforms, meeting SLA objectives. Drives FinOps practices and capacity planning for data workloads, enforcing cost guardrails and resource optimization. Implements security and compliance controls for data operations, ensuring audit readiness and adherence to regulatory requirements. Serves as Subject Matter Expert (SME) for data reliability and operational excellence. Mentors analysts and curates knowledge assets, fostering best practices in governance and reliability. Conveys team strategy and operational goals through strong written and verbal communication.

Requirements

  • Bachelor’s degree in computer science, Information Systems, or related field OR equivalent combination.
  • 5+ years of experience in data/platform operations or site reliability with enterprise scope.
  • Proven track record leading complex operational initiatives and incident responses.
  • Extensive experience with enterprise-scale cloud and on-premises data platforms.

Nice To Haves

  • Experience designing and implementing observability, automation, and reliability solutions.
  • Background in ITIL process implementation and operational governance.

Responsibilities

  • Owns day-to-day operations of assigned data platforms or pipeline domains.
  • Investigates and resolves moderately complex incidents with minimal supervision.
  • Performs root cause analysis for recurring issues and documents corrective actions.
  • Coordinates with infrastructure, security, and engineering teams during deployments and incident resolution.
  • Supports migration activities, platform upgrades, and release validation.
  • Implements monitoring enhancements and automated health checks.
  • Participates in disaster recovery testing and data validation exercises.
  • Optimizes SQL queries and pipeline performance under guidance.
  • Assists in implementing cost-monitoring practices for cloud workloads.
  • Communicates operational status and risks to stakeholders.
  • Prioritizes workload to meet team service-level objectives.
  • Ensures end-to-end reliability of data platforms, including cloud (Azure Databricks, ADLS) and on-premises systems (Oracle, Informatica, SAS, Cognos).
  • Leads incident response and root-cause analysis for critical data operations issues, driving permanent fixes.
  • Architects monitoring and observability frameworks for data pipelines, BI refreshes, and platform health, leveraging golden signals and automated checks.
  • Designs and implements automation and infrastructure-as-code solutions to streamline operational workflows and environment provisioning.
  • Governs data migrations, releases, and upgrades, ensuring integrity, rollback strategies, and compliance with change management standards.
  • Optimizes data pipelines, queries, and storage strategies for performance, scalability, and cost efficiency across hybrid environments.
  • Develops and validates disaster recovery and business continuity plans for data platforms, meeting SLA objectives.
  • Drives FinOps practices and capacity planning for data workloads, enforcing cost guardrails and resource optimization.
  • Implements security and compliance controls for data operations, ensuring audit readiness and adherence to regulatory requirements.
  • Serves as Subject Matter Expert (SME) for data reliability and operational excellence.
  • Mentors analysts and curates knowledge assets, fostering best practices in governance and reliability.
  • Conveys team strategy and operational goals through strong written and verbal communication.

Benefits

  • ERCOT offers an excellent benefits package, which includes health, dental, vision, life insurance, long/short-term disability insurance, long-term care insurance, Section 125 Flexible Spending Account, and a Retirement Savings Plan.
  • The medical, dental, vision, and quality-of-life benefits offered by ERCOT are second to none.
  • Full-time employees are eligible to enroll in benefits on the first of the month following the date of hire.
  • Additionally, 401(k) plans are available to help employees plan for the future.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service