Whiting School of Engineering's Johns Hopkins Data Science and AI Institute (DSAI) seeks a DevOps Engineer (Sr. Systems Engineer) to design, configure, and maintain tools and processes to facilitate the work of DSAI’s research and software engineers. This engineer will collaborate daily with DSAI RSEs and engineers from the Institute for Data Intensive Engineering and Science (IDIES); which provides much of DSAI’s local compute and storage), Whiting School of Engineering, JHU Central IT, JHU Research IT, and JHU’s HPC. The Sr. Systems Engineer will provide technical leadership, project management, and task execution for administration, programming, maintenance, performance, implementation, security and support of various departmental and enterprise-wide multiple platforms, including the installation and texting of new software, operating systems, related utilities/services, and hardware products as well as the integration of new products and/or software release upgrades into the current environment. The Sr. Systems Engineer will conduct systems performance evaluations, monitoring, patch management and security evaluations. The Sr. Systems Engineer will analyze user needs in various computer environments (including but not limited to mainframe, Windows, and mid-range) and make recommendations for products and services that meet those needs. The Sr. Systems Engineer will ensure that all systems environments are maintained in an efficient and cost-effective manner. Specific Duties & Responsibilities Systems Analysis/Design (Environment/Platform) Design highly complex business, clinical, education, or infrastructure solutions by meeting with customers to observe and understand current processes and the issues related to those processes. Provide written documentation and diagrams of findings to share with the client and other IT colleagues. Assist lower levels to effectively use the system's technical software. Design highly complex solutions that conform to institutional policies, standards, and guidelines, and infrastructure environment and to vendor and industry best practices to deliver a quality product. Select infrastructure applications that reside between end user applications and hardware operating systems by working with vendors, customers, and other sources (i.e., open source or Internet2 initiatives) to provide configurable tools to the customers. Develop new methods to improve service processes, performance, and functionality by examining system management tools and processes. Review new methods suggested by lower levels and approve the work. Research, recommend, and implement new technologies based on the value to the institution. Works with vendor processes and products to improve the quality and fit for the institution. Typically establishes product mastery and demonstrates initiative for improvements. Assign and lead technical systems analysis and design tasks for assigned environments and platforms. Install & Configure Install and configure highly complex server hardware and operating systems by following technical documentation to provide a working product. Evaluate, implement, and manage appropriate highly complex software and hardware solutions by using best practices for the environment to ensure system integrity. Install and configure infrastructure applications by following product installation and configuration directions and industry best practices to deliver a solution to the customers. Ensure an effective schedule is developed of system backups and archive operations by providing leadership, oversight, and direction to technical team in best practices for the environment to ensure data/media recoverability. Lead and provide direction to technical team for all above tasks by reviewing work and adherence to institutional standards and guidelines to deliver projects on time and within budget to the customers Maintain & Troubleshoot Provide highly complex server level administration (manage HW/SW, maintenance, upgrades and patches, account maintenance, backups and recoveries and assist users) by following documented procedures to ensure a stable environment. Monitor and tune the system by following documentation and procedures to achieve optimum performance levels. Develop highly complex scripts and solutions by using departmental standards to automate systems management. Perform highly complex system software upgrades including planning and scheduling, testing, and coordination by following documentation and departmental standards to provide a stable product for the environment. Audit and maintain user access and authorization by following access and authorization documentation to provide for system security. Generate and maintain highly complex periodic and ongoing system specific reports by using appropriate tools to assess system performance, integrity and capacity in order to deliver a stable environment to the users. Follow and maintain IT security awareness and best practices by understanding security principles as they pertain to environments supported in order to deliver secure solutions to customers. Utilize system management and monitoring tools and incident tracking systems by following documentation and standards to detect incidents, take corrective actions, and determine root cause. Monitor changes and resolve any incidents by responding to problems as they occur, by reviewing all processing and output of the newly implemented solution, and by proactively ensuring the solution works successfully to satisfy the customer requirements and to provide a smooth transition to the new solution. Lead and provide direction to technical team for all the above tasks by reviewing work and adherence to institutional standards and guidelines to deliver high quality maintenance and troubleshooting to the customers. Project Collaboration & Lifecycle Participation Implement changes by adhering to the change management policies and procedures for any given project to communicate to all parties the nature, significance, and risk factors of the solution. Lead effort to develop RFPs by engaging project team members in the process in order to develop well defined requirements to potential vendors for proposed solutions. Evaluate vendor proposals by reviewing requirements for the product to select the most appropriate vendor. Lead vendors, consultants, and inside Enterprise groups in developing applications by meeting with the team on a regular basis to deliver quality products to customers. Lead scheduled project team meetings by attending all meetings to provide input to the project team. Author and maintain documentation by writing audience-appropriate materials to serve as technical and/or end user reference. Lead technical team in test planning, test scenario construction, and test sessions appropriate to the changes being implemented by following testing guidelines to ensure all delivered solutions work as expected and errors are handled in a meaningful way. Review test results and corrections to all changes by following institutional and departmental testing standards to ensure all delivered solutions work as expected and errors are handled in a meaningful way. Participate in Institutional and Departmental committees and initiatives. Lead and provide direction to technical team for all of the above tasks by reviewing work and adherence to institutional standards and guidelines to ensure collaboration and communication with team members and customers. Perform other related duties as requested. In addition to the duties described above Infrastructure Management Design, implement, and maintain on-premises and cloud-based infrastructure for DSAI researchers and projects. Manage and optimize resource allocation, ensuring efficient utilization of compute, storage, and network resources. CI/CD and Automation Develop and implement CI/CD pipelines for software development and deployment. Collaboration and Support Collaborate with IDIES on configuration and maintenance of local compute and storage Collaborate with researchers and data scientists to understand their infrastructure needs and provide technical guidance. Support the deployment and scaling of machine learning models on various platforms, including cloud-based services and on-premises clusters. Work closely with IT security teams to ensure the security and integrity of DSAI systems and data. Become an expert in using the various JHU compute and storage options that JHU makes available via its various IT organizations (IDIES, Central IT, Research IT, Whiting School of Engineering, JHU HPC, Azure, AWS) and act as an advisor, mentor, and liaison for RSEs seeking to use them. Research Computing Support Assist researchers with utilizing high-performance computing (HPC) clusters and specialized hardware for computationally intensive tasks. Optimize research workflows and provide guidance on best practices for utilizing computational resources.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior