At ERCOT, our diverse and dynamic work environment provides a platform on which employees can work together to build the future of the Texas power grid and wholesale market utilizing the latest technologies and resources. We encourage you to join our talented, dedicated workforce to develop world-class solutions for today and tomorrow’s energy challenges while learning new skills and growing your career. ERCOT is committed to fostering inclusion at all levels of our company. It is the cornerstone of our corporate values of accountability, leadership, innovation, trust, and expertise. We know that individuals with a wide variety of talents, ideas, and experiences propel the innovation that drives our success. An inclusive and diverse workforce strengthens us and allows for a collaborative environment to solve the challenges that face our industry today and in the future. JOB SUMMARY Supports and ensures the reliability, performance, and integrity of enterprise data platforms across cloud (Azure Databricks, ADLS, Power BI) and on-premises systems (Oracle, Informatica, SAS, Cognos). Works as part of the Data Operations team to monitor data pipelines, troubleshoot incidents, support migrations and releases, and maintain platform stability. Progressively assumes responsibility for operational ownership, automation, observability, governance, optimization, and reliability engineering practices to support business-critical data workloads. Job Duties Level 2 & 3 Owns day-to-day operations of assigned data platforms or pipeline domains. Investigates and resolves moderately complex incidents with minimal supervision. Performs root cause analysis for recurring issues and documents corrective actions. Coordinates with infrastructure, security, and engineering teams during deployments and incident resolution. Supports migration activities, platform upgrades, and release validation. Implements monitoring enhancements and automated health checks. Participates in disaster recovery testing and data validation exercises. Optimizes SQL queries and pipeline performance under guidance. Assists in implementing cost-monitoring practices for cloud workloads. Communicates operational status and risks to stakeholders. Prioritizes workload to meet team service-level objectives. Level – Sr. (in additional to Level 2 &3) Ensures end-to-end reliability of data platforms, including cloud (Azure Databricks, ADLS) and on-premises systems (Oracle, Informatica, SAS, Cognos). Leads incident response and root-cause analysis for critical data operations issues, driving permanent fixes. Architects monitoring and observability frameworks for data pipelines, BI refreshes, and platform health, leveraging golden signals and automated checks. Designs and implements automation and infrastructure-as-code solutions to streamline operational workflows and environment provisioning. Governs data migrations, releases, and upgrades, ensuring integrity, rollback strategies, and compliance with change management standards. Optimizes data pipelines, queries, and storage strategies for performance, scalability, and cost efficiency across hybrid environments. Develops and validates disaster recovery and business continuity plans for data platforms, meeting SLA objectives. Drives FinOps practices and capacity planning for data workloads, enforcing cost guardrails and resource optimization. Implements security and compliance controls for data operations, ensuring audit readiness and adherence to regulatory requirements. Serves as Subject Matter Expert (SME) for data reliability and operational excellence. Mentors analysts and curates knowledge assets, fostering best practices in governance and reliability. Conveys team strategy and operational goals through strong written and verbal communication.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level