Site Reliability Administrator (Network/Platform/Database)

Portland General ElectricPortland, OR
1d

About The Position

At PGE, our work involves dreaming about, planning for, and realizing a smarter, cleaner, more enduring Oregon neighborhood. Its core to our DNA and we haven’t stopped since we started in 1888. We energize lives, strengthen communities and drive advancements in energy that promote social, economic and environmental progress. We’re always on the lookout for people passionate about leading and being a part of teams that are advancing innovative clean energy solutions that are also affordable and accessible to all. Job Overview The Site Reliability Administrator plays a critical role in supporting the reliability, stability, and continuous improvement of the organization’s core IT systems and services. Leveraging foundational knowledge across database, platform, and network environments, this position helps ensure that systems are properly maintained, proactively monitored, and continuously enhanced in alignment with established standards. By minimizing service disruptions and strengthening operational resilience, the role directly contributes to effective incident response and dependable technology outcomes that enable day-to-day business operations. As part of a newly formed team, this position offers the opportunity to help build a strong operational foundation and shape emerging reliability practices. The Site Reliability Administrator collaborates closely with peers, managers, and cross-functional stakeholders to support consistent service delivery while promoting shared accountability across the three core pillars—network, platform, and database. This role also supports cross-training efforts to expand team capability and ensure balanced, flexible support coverage, fostering a culture of continuous learning and long-term system reliability.

Requirements

  • Requires a bachelor’s degree in IT, business, computer science, engineering management or other related field or equivalent experience.
  • Typically five or more years of experience in the development, implementation and maintenance of IT systems.
  • Experience in Windows/Linux servers, virtualization and networking fundamentals.
  • ITIL Foundation and Microsoft Certification preferred
  • Intermediate knowledge and experience working with operating systems
  • Intermediate knowledge and experience working with network, platform or database systems
  • Intermediate knowledge of work management tools
  • Intermediate knowledge of ITIL or other service operations framework
  • Advanced knowledge of patch management tools
  • Advanced scripting language skills (PowerShell, Python, Bash)
  • Advanced problem-solving skills
  • Advanced analytical thinking skills
  • Advanced accuracy skills
  • Advanced risk management skills
  • Advanced oral and written communication skills
  • Advanced collaboration skills
  • Advanced organization and prioritization skills
  • Advanced time management skills
  • Ability to adhere to response times, deadlines, and time-sensitive tasks
  • Ability to communicate and solve problems under stress
  • Ability to respond and adapt to frequent change
  • Ability to collaborate effectively with peers, managers, and stakeholders
  • Ability to process new information and apply it consistently
  • Ability to accept feedback and demonstrate self-awareness
  • Ability to adhere to established schedules and attendance standards
  • Ability to work variable hours and long hours as needed
  • Ability to support shift schedules
  • Ability to support after-hours on-call with 15-minute response times
  • Ability to report to work during severe inclement weather
  • Ability to drive into the office within a two-hour drive time if needed
  • Valid driver’s license required
  • Occasional travel and overnight travel may be required
  • Regular computer use throughout the work shift

Nice To Haves

  • Experience with cloud platforms (AWS, AZURE, GCP) preferred.
  • Experience with Oracle and/or SQL preferred.

Responsibilities

  • Provides technical expertise to enable the correct application of operational procedures.
  • Schedules and executes maintenance windows to minimize service disruption.
  • Uses network management tools to determine network load and performance statistics.
  • Contributes to the planning and implementation of maintenance and installation work including parching and updates to servers, operating systems and infrastructure components.
  • Implements agreed network changes and maintenance routines.
  • Validates patch deployments to ensure compliance with security and operational standards.
  • Proactively monitors system health and identifies potential operational problems and contributes to their resolution, checking that they are managed in accordance with agreed standards and procedures.
  • Conducts reliability reviews and provides proposals for continuous improvement to specialists, users and managers.
  • Prioritizes and diagnoses incidents according to agreed procedures.
  • Investigates causes of incidents and seeks resolution within established service level agreements (SLAs).
  • Escalates unresolved incidents.
  • Facilitates recovery, following resolution of incidents.
  • Implements corrective actions to prevent recurrence.
  • Documents and closes resolved incidents according to agreed procedures.
  • Initiates and monitors actions to investigate and resolve problems in systems, processes and services.
  • Determines problem fixes/remedies.
  • Assists with the implementation of agreed remedies and preventative measures.
  • Develops scripts and automation tools to streamline repetitive tasks (e.g., patching, monitoring, reporting).
  • Maintains documentation for automated processes and workflows.
  • Identifies opportunities for operational efficiency and implements automation solutions.
  • Reviews system software updates and identifies those that merit action.
  • Tailors system software to maximize hardware functionality.
  • Installs and tests new versions of system software.
  • Investigates and coordinates the resolution of potential and actual service problems.
  • Prepares and maintains operational documentation for system software.
  • Advises on the correct and effective use of system software.
  • Develops, documents and implements changes based on requests for change.
  • Applies change control processes and procedures.
  • Applies tools, techniques and processes to manage and report on change requests.
  • Communicates effectively with Tier 1 support and stakeholders during incidents and maintenance activities and documents incident resolutions, maintenance procedures and automation scripts.
  • Facilitates and ensures knowledge transfer.
  • Performs defined tasks to monitor service delivery against service level agreements and maintains records of relevant information.
  • Analyzes service records against agreed service levels regularly to identify actions required to maintain or improve levels of service and initiates or reports these actions.
  • Assists with implementing and monitoring security policies and protocols across different systems.
  • Contributes to identifying and addressing potential risks in security governance and compliance.
  • Assesses and develops strategies for patching and maintenance solutions that ensure security compliance, minimize service disruption and mitigate risks.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service