Belay Technologies has been voted Baltimore Business Journal's (BBJ) Best Places to Work 2019, runner up in 2020 and a finalist in 2021! Belay Technologies is seeking a System Engineer (HPDA) to join our intel team. Labor Category Description: Applies systems engineering principles throughout the systems life cycle phases: Concept, Development, Production, Utilization, Support, and Retirement. Interacts with the Government regarding Systems Engineering technical considerations and for associated problems, issues or conflicts. Communicates with other program personnel, government overseers, and senior executives. Responsibility for the technical integrity, quality, and completeness of work performed and deliverables associated with one or more of the 25 process areas defined by ISO/IEC15288: Technical Process Area - Stakeholder Requirements Definition, Requirements Analysis, Architectural Design, Implementation, Integration, Verification, Transition, Validation, Operation, Maintenance, and Disposal. Project Process Area - Project Planning, Project Assessment and Control, Decision Management, Risk Management, Configuration Management, Information Management, and Measurement. Enterprise (Organizational Project-Enabling) Process Area - Project Portfolio Management, Infrastructure Management, Lifecycle Model Management, Human Resource Management, and Quality Management. Agreement Process Area - Acquisition and Supply. The project represents a foundational effort targeted at developing full reliability and resiliency for the newest HPC systems and customers. Reviewing the reliability class requirements, mission customer needs, IT system requirements, implementation, project planning, risk management, etc. Team members independently analyze various elements, develop recommendations, engage in planning activities and implement various elements, validate and verify solutions through testing, integrate solutions to gaps identified, and support ongoing risk management. Example tasks include: Determine the most critical mission support activities and corresponding IT assets, such as servers, applications, and data, which are essential for business continuity. Prioritize these assets in the resiliency plan based on their importance to the organization. Formulate strategies to recover critical IT assets following a disruptive event. This may include implementing backup systems, adopting redundancy measures, or using alternative work locations. Independently analyze and recommend redundancy measures for critical components, such as backup systems, communication networks, and data centers, to ensure continuous operation in case of a disruption. Also, establish fault tolerance mechanisms to detect, isolate, and recover from errors to maintain mission continuity. Develop a detailed disaster recovery plan, focusing on the requirements of high criticality missions. Define recovery time objectives (RTOs) and recovery point objectives (RPOs) based on the mission's specific needs and create procedures to restore critical systems and infrastructure to functional states. Regularly test and update the disaster recovery plan to maintain its effectiveness. Continuously review and update the resiliency plan to reflect changes in the infrastructure, business requirements, and potential threats. This may include regular testing and validation of the plan to ensure its effectiveness. Work with the customer metrics and monitoring team to introduce new metrics capabilities to support the resiliency program. Regularly review the plan's effectiveness and performance through post-event analysis, audits, or reviews. Implement improvements and updates to the plan as necessary based on lessons learned and changing circumstances. Involve key stakeholders in the development and implementation of the resiliency plan, such as upper management. IT staff, and external partners like vendors or service providers. Ensure that the resiliency plan aligns with relevant standards, guidelines, and regulations that govern highly critical missions. Conduct regular audits and assessments to maintain compliance and promote continuous improvement. Analyze and propose ongoing risk management and monitoring processes to identify and respond to emerging threats, changes in operational environments, and technology advancements. Regularly update and adapt the resiliency plan to maintain the mission's overall resilience. Experience with mission assurance, reliability and resiliency planning and stakeholder engagement is beneficial.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
High school or GED