Reliability & Monitoring Engineer

NextpowerNashville, TN
2d

About The Position

Position Overview / Role Purpose The Reliability & Monitoring Engineer is responsible for fleet-level monitoring, incident analysis, and reliability insights for Nextracker-supported utility-scale solar tracker systems. This role provides real-time system visibility, post-event analysis, and actionable intelligence that support rapid recovery and long-term asset reliability, particularly following severe weather, and other high-impact events. Operating within a portfolio-based support model, the Reliability & Monitoring Engineer translates monitoring data into clear technical insights that improve system uptime, inform customer communication, and strengthen long-term asset performance. This is a desk-based role within the Nextpower organization, focused on proactive monitoring, analytical investigation, and continuous operational improvement, working closely with the U.S. Technical Services organization and the Manager, Remote Monitoring & Asset Resilience (U.S.). Key Objectives Deliver High-Quality Fleet Monitoring Continuously monitor utility-scale tracker fleets to detect abnormal system behavior, communication failures, and offline assets across customer portfolios. Lead Incident Analysis & Root Cause Investigation Perform structured incident analysis and Root Cause Analysis (RCA) for alarms, outages, and post-weather events, producing clear, technically sound findings. Support Technical Services & Customer Communication Provide monitoring-based insights and documentation that enhance Technical Services’ ability to resolve issues quickly and communicate effectively with customers. Drive Reliability Insights & Operational Improvement Identify recurring issues and systemic risks, and contribute to the refinement of monitoring thresholds, alert logic, and operational playbooks that improve asset resilience

Requirements

  • Bachelor’s degree in Electrical Engineering, Energy Engineering, Renewable Energy, or a related technical field; equivalent relevant experience will be considered.
  • 2+ years of experience in solar operations, fleet monitoring, reliability analysis, operations centers, software development and debugging, or other technical roles.
  • Understanding of renewable energy technologies, mechanics, and system behavior (electro-mechanical systems, communications, control signals).
  • Experience performing Root Cause Analysis using operational and monitoring data, including logs, trend charts, and event histories.
  • Strong analytical skills with high attention to detail and a structured, data-driven problem-solving approach.
  • Clear technical writing skills and the ability to communicate findings to both technical and non-technical audiences, including customers and senior stakeholders.
  • Proficiency with web-based monitoring platforms and strong working knowledge of Excel (or Google Sheets) for data analysis and reporting; familiarity with NX Navigator is highly desirable.
  • Technical & Analytical Experience or Familiarity with NX Navigator or other web-based fleet monitoring / SCADA-like platforms for utility-scale assets.
  • Ability to interpret time-series data, alarms, and event logs to diagnose performance and reliability issues.
  • Proficiency with Excel (or Google Sheets) for data analysis, trending, and reporting.
  • Strong written and verbal communication skills, with the ability to craft concise incident summaries, RCA documents, and status updates.
  • Proven ability to work cross-functionally with Technical Services, Engineering, Product, and Operations teams.
  • Customer- and stakeholder-focused mindset, ensuring information is accurate, timely, and tailored to audience needs.
  • Strong organizational skills with the ability to prioritize and manage multiple events and monitoring tasks concurrently.
  • Reliability and consistency in following established SOPs, workflows, and documentation standards in a time-sensitive environment.
  • Adaptability to evolving operational needs, portfolio growth, and changes in monitoring tools or processes.
  • Comfort operating in a fast-paced, incident-driven environment, including occasional support during off-hours events as required by coverage models.

Nice To Haves

  • Familiarity with Python or similar analytical tools for more advanced data processing is a plus.
  • Familiarity with dashboarding and analytical tools such as Power BI and Databricks is a nice to have, particularly for building or interacting with reliability and performance dashboards.
  • Understanding of weather-driven operational risk, including how wind, storms, and extreme conditions influence tracker and plant behavior.
  • Exposure to reliability engineering concepts, anomaly detection, or performance analytics.
  • Experience working with ticketing or case management systems (e.g., Jira, ServiceNow, Salesforce Service Cloud or equivalent) to track incidents and follow-up actions.
  • Exposure to Confluence or similar knowledge management tools for SOPs, RCAs, and incident documentation.
  • Comfort using collaboration tools (e.g., email, chat, shared docs) in a fast-paced operational environment.
  • Willingness to participate in cross-functional reviews and share insights that improve product design, field practices, and monitoring strategies.

Responsibilities

  • Fleet Monitoring & Operational Awareness Monitor utility-scale solar tracker fleets using web-based monitoring platforms, including NX Navigator, to maintain real-time awareness of system status.
  • Identify abnormal system states, communication failures, and offline assets across assigned customer portfolios.
  • Support remote operational actions during high-wind and severe weather events, including coordination of tracker stow and recovery activities under the direction of the Manager, Remote Monitoring & Asset Resilience.
  • Maintain clear situational awareness across active customer sites, including key alarms, stow states, communication health, and emerging risk signals.
  • Log and track monitoring observations, ensuring key events are captured in internal systems and aligned with established RMC workflows and SOPs.
  • Incident Response & Reliability Analysis Perform structured Root Cause Analysis (RCA) for system alarms, outages, and post-weather events using operational data, logs, SCADA-like signals, and environmental inputs.
  • Correlate tracker behavior, monitoring signals, and weather data to determine probable failure mechanisms and reliability risks.
  • Produce clear, technically sound incident summaries and RCA documentation for customers, Technical Services, and internal stakeholders.
  • Support warranty-aligned documentation and evidence collection, ensuring events are captured in a way that supports potential warranty claims and risk assessments.
  • Participate in post-event reviews, providing data-driven input on incident timelines, system behavior, and key contributing factors.
  • Customer & Technical Services Support Provide monitoring-based technical analysis to support customer issues managed by the Technical Services team and other customer-facing functions.
  • Translate complex system behavior into clear, actionable insights that enable Technical Services to prioritize and execute field or remote actions.
  • Ensure that incident records, timelines, and findings meet internal service expectations and quality standards for accuracy, completeness, and clarity.
  • Support preparation of materials for customer calls, reports, and follow-ups by supplying data extracts, charts, and concise technical summaries derived from monitoring platforms.
  • Performance Trends & Continuous Improvement Identify recurring issues, performance degradation patterns, and systemic reliability risks across the monitored fleet.
  • Contribute recommendations to improve monitoring thresholds, alerting logic, and response workflows, helping to reduce false alarms and improve signal-to-noise ratio.
  • Support refinement of monitoring tools, dashboards, and operational playbooks in partnership with the Manager, Remote Monitoring & Asset Resilience, and cross-functional stakeholders.
  • Participate in pilots or trials of new monitoring features, analytics capabilities, or alert configurations, providing structured feedback on effectiveness and usability.
  • Cross-Functional Collaboration & Documentation Partner with Engineering, Product, Operations, and Technical Services teams to share monitoring-based field intelligence and support long-term reliability improvements.
  • Contribute to the creation and maintenance of SOPs, monitoring playbooks, training materials, and internal knowledge bases used by the Remote Monitoring Center.
  • Document findings, workflows, and lessons learned in a clear and reusable format to support team scaling and onboarding.
  • Support knowledge sharing and best-practice development within the monitoring and reliability team, including informal coaching of peers on tools, workflows, and analysis methods.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service