Hardware Engineer - Server Hardware Management

Morgan StanleyNew York, NY
1d$120,000 - $165,000

About The Position

In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities. This is a Technology Leadership & Strategy position at the Director level, which is part of the job family responsible for driving innovation, developing technology roadmaps, and providing vision and strategic direction to align technology initiatives with long-term business objectives, fostering a culture of excellence, growth and transformation across the organization. Morgan Stanley is an industry leader in financial services, known for mobilizing capital to help governments, corporations, institutions, and individuals around the world achieve their financial goals. Interested in joining a team that’s eager to create, innovate and make an impact on the world? Read on. The Morgan Stanley Innovation Labs, part of Firmwide Innovation, provide the technical environments and strategic support that help teams explore emerging technologies and turn early ideas into scalable solutions. Firmwide Innovation connects technology and business leaders to deliver trusted advice, strategic services, and purpose-built environments that accelerate experimentation and guide informed decisions across the Integrated Firm. The Labs operate a hybrid footprint that includes on-premises data center infrastructure, advanced hardware, virtualized platforms, and a significant public cloud presence. These environments enable secure and realistic testing across areas such as AI, accelerated compute, networking, and next-generation infrastructure. They support thousands of assets used across Technology and play a central role in research, vendor evaluation, pilots, and solution design. The Network Compute and Storage Squad manages the compute, storage, and network foundations that power these environments. The team oversees capacity, performance, monitoring, automation, and operational excellence to ensure the Labs remain reliable, flexible, and aligned with evolving firmwide needs. The Squad supports the full lifecycle of technology assessments by maintaining high-quality environments that help validate solutions, reduce risk, and strengthen decision making across the Firm. The Innovation Labs team is looking for a highly skilled and motivated Hardware Engineer with expertise in GPU technologies. In this position, you will design, develop, troubleshoot, and implement strategies to automate and optimize server hardware infrastructure. Success in this role requires working closely with cross-functional teams, external vendors, and internal stakeholders to provide high-performance and reliable hardware solutions.

Requirements

  • Minimum four years of hands-on experience supporting and troubleshooting data center GPUs, including H100 and NVIDIA DGX B300 series or newer.
  • Demonstrated proficiency with advanced technologies, including Infiniband and NVLink.
  • Strong proficiency in Ansible and Python.
  • Experience with IPMI and preferably Redfish for programmatic communication with server BMCs.
  • Ability to collaborate effectively with engineers and developers in Agile environments.
  • Experience of managing, deploying, and troubleshooting, large scale production environments including application of security principles and system hardening.
  • Knowledge of Linux, and O/S and network protocols.
  • Knowledge of x86 hardware and peripherals, including Out of Band or Lights out Management.
  • In-depth knowledge of server hardware, components, and management technologies, particularly GPUs and PCIe devices.
  • Effective troubleshooting skills across hardware, O/S, network, and storage.
  • Masters's degree in computer science, computer engineering, or equivalent experience
  • Excellent communication skills are paired with strong self-management capabilities

Nice To Haves

  • Networking knowledge is an added advantage
  • Experience working in Financial Services or Enterprise Technology firms is preferred but not mandatory
  • Experience of driving enterprise-level initiatives, working with senior stakeholders across various regions and cultures

Responsibilities

  • Work on business-enabling infrastructure projects utilizing leading edge CPU, GPU, APU, storage and networking architectures, security strengthening and operational scaling.
  • Create clear procedures for testing hardware internally, deploying systems, optimizing performance, and resolving technical issues.
  • Develop and maintain thorough documentation covering hardware designs, specifications, testing procedures, and results.
  • Conduct thorough evaluations of hardware systems to identify operational problems and recommend effective improvements that boost overall efficiency.
  • Create and build software solutions that include in-house systems, third-party vendor platforms, and open-source technologies.
  • Deliver dependable automation solutions designed to enhance the management of our Innovation Lab servers and network infrastructure. This includes facilitating remote access, updating firmware via IDRAC/IPMI, and integrating peripheral devices efficiently.
  • Troubleshoot complex problems involving software programs, operating systems, and hardware components.
  • Assess and certify each new or replacement device, providing thorough analysis of how they integrate with the MS plant.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service