Reliability & Monitoring Engineer

Nextpower•Nashville, TN

About The Position

Position Overview / Role Purpose The Reliability & Monitoring Engineer is responsible for fleet-level monitoring, incident analysis, and reliability insights for Nextracker-supported utility-scale solar tracker systems. This role provides real-time system visibility, post-event analysis, and actionable intelligence that support rapid recovery and long-term asset reliability, particularly following severe weather, and other high-impact events. Operating within a portfolio-based support model, the Reliability & Monitoring Engineer translates monitoring data into clear technical insights that improve system uptime, inform customer communication, and strengthen long-term asset performance. This is a desk-based role within the Nextpower organization, focused on proactive monitoring, analytical investigation, and continuous operational improvement, working closely with the U.S. Technical Services organization and the Manager, Remote Monitoring & Asset Resilience (U.S.). Key Objectives Deliver High-Quality Fleet Monitoring Continuously monitor utility-scale tracker fleets to detect abnormal system behavior, communication failures, and offline assets across customer portfolios. Lead Incident Analysis & Root Cause Investigation Perform structured incident analysis and Root Cause Analysis (RCA) for alarms, outages, and post-weather events, producing clear, technically sound findings. Support Technical Services & Customer Communication Provide monitoring-based insights and documentation that enhance Technical Services’ ability to resolve issues quickly and communicate effectively with customers. Drive Reliability Insights & Operational Improvement Identify recurring issues and systemic risks, and contribute to the refinement of monitoring thresholds, alert logic, and operational playbooks that improve asset resilience

Requirements

Bachelor’s degree in Electrical Engineering, Energy Engineering, Renewable Energy, or a related technical field; equivalent relevant experience will be considered.
2+ years of experience in solar operations, fleet monitoring, reliability analysis, operations centers, software development and debugging, or other technical roles.
Understanding of renewable energy technologies, mechanics, and system behavior (electro-mechanical systems, communications, control signals).
Experience performing Root Cause Analysis using operational and monitoring data, including logs, trend charts, and event histories.
Strong analytical skills with high attention to detail and a structured, data-driven problem-solving approach.
Clear technical writing skills and the ability to communicate findings to both technical and non-technical audiences, including customers and senior stakeholders.
Proficiency with web-based monitoring platforms and strong working knowledge of Excel (or Google Sheets) for data analysis and reporting; familiarity with NX Navigator is highly desirable.
Technical & Analytical Experience or Familiarity with NX Navigator or other web-based fleet monitoring / SCADA-like platforms for utility-scale assets.
Ability to interpret time-series data, alarms, and event logs to diagnose performance and reliability issues.
Proficiency with Excel (or Google Sheets) for data analysis, trending, and reporting.
Strong written and verbal communication skills, with the ability to craft concise incident summaries, RCA documents, and status updates.
Proven ability to work cross-functionally with Technical Services, Engineering, Product, and Operations teams.
Customer- and stakeholder-focused mindset, ensuring information is accurate, timely, and tailored to audience needs.
Strong organizational skills with the ability to prioritize and manage multiple events and monitoring tasks concurrently.
Reliability and consistency in following established SOPs, workflows, and documentation standards in a time-sensitive environment.
Adaptability to evolving operational needs, portfolio growth, and changes in monitoring tools or processes.
Comfort operating in a fast-paced, incident-driven environment, including occasional support during off-hours events as required by coverage models.

Nice To Haves

Familiarity with Python or similar analytical tools for more advanced data processing is a plus.
Familiarity with dashboarding and analytical tools such as Power BI and Databricks is a nice to have, particularly for building or interacting with reliability and performance dashboards.
Understanding of weather-driven operational risk, including how wind, storms, and extreme conditions influence tracker and plant behavior.
Exposure to reliability engineering concepts, anomaly detection, or performance analytics.
Experience working with ticketing or case management systems (e.g., Jira, ServiceNow, Salesforce Service Cloud or equivalent) to track incidents and follow-up actions.
Exposure to Confluence or similar knowledge management tools for SOPs, RCAs, and incident documentation.
Comfort using collaboration tools (e.g., email, chat, shared docs) in a fast-paced operational environment.
Willingness to participate in cross-functional reviews and share insights that improve product design, field practices, and monitoring strategies.

Responsibilities

Fleet Monitoring & Operational Awareness Monitor utility-scale solar tracker fleets using web-based monitoring platforms, including NX Navigator, to maintain real-time awareness of system status.
Identify abnormal system states, communication failures, and offline assets across assigned customer portfolios.
Support remote operational actions during high-wind and severe weather events, including coordination of tracker stow and recovery activities under the direction of the Manager, Remote Monitoring & Asset Resilience.
Maintain clear situational awareness across active customer sites, including key alarms, stow states, communication health, and emerging risk signals.
Log and track monitoring observations, ensuring key events are captured in internal systems and aligned with established RMC workflows and SOPs.
Incident Response & Reliability Analysis Perform structured Root Cause Analysis (RCA) for system alarms, outages, and post-weather events using operational data, logs, SCADA-like signals, and environmental inputs.
Correlate tracker behavior, monitoring signals, and weather data to determine probable failure mechanisms and reliability risks.
Produce clear, technically sound incident summaries and RCA documentation for customers, Technical Services, and internal stakeholders.
Support warranty-aligned documentation and evidence collection, ensuring events are captured in a way that supports potential warranty claims and risk assessments.
Participate in post-event reviews, providing data-driven input on incident timelines, system behavior, and key contributing factors.
Customer & Technical Services Support Provide monitoring-based technical analysis to support customer issues managed by the Technical Services team and other customer-facing functions.
Translate complex system behavior into clear, actionable insights that enable Technical Services to prioritize and execute field or remote actions.
Ensure that incident records, timelines, and findings meet internal service expectations and quality standards for accuracy, completeness, and clarity.
Support preparation of materials for customer calls, reports, and follow-ups by supplying data extracts, charts, and concise technical summaries derived from monitoring platforms.
Performance Trends & Continuous Improvement Identify recurring issues, performance degradation patterns, and systemic reliability risks across the monitored fleet.
Contribute recommendations to improve monitoring thresholds, alerting logic, and response workflows, helping to reduce false alarms and improve signal-to-noise ratio.
Support refinement of monitoring tools, dashboards, and operational playbooks in partnership with the Manager, Remote Monitoring & Asset Resilience, and cross-functional stakeholders.
Participate in pilots or trials of new monitoring features, analytics capabilities, or alert configurations, providing structured feedback on effectiveness and usability.
Cross-Functional Collaboration & Documentation Partner with Engineering, Product, Operations, and Technical Services teams to share monitoring-based field intelligence and support long-term reliability improvements.
Contribute to the creation and maintenance of SOPs, monitoring playbooks, training materials, and internal knowledge bases used by the Remote Monitoring Center.
Document findings, workflows, and lessons learned in a clear and reusable format to support team scaling and onboarding.
Support knowledge sharing and best-practice development within the monitoring and reliability team, including informal coaching of peers on tools, workflows, and analysis methods.