Principal GPU/CPU Systems Engineer

OracleSeattle, WA
22h

About The Position

Description Required Qualifications 10 or more years of experience in hardware design, system engineering, and platform bring-up. Hands-on experience with market-leading GPUs or AI platforms spanning development, bring-up, test, and characterization. Strong knowledge of AI/GPU and or AI/CPU platform architectures and capabilities. Experience evaluating system architectures, platform definitions, and implementation paths. Ability to balance hardware performance, power, cost, regulatory, and cross-functional requirements. Experience with modern server platforms across x86 and ARM architectures. Hardware development experience at the system, board, and FPGA levels. Proficiency reviewing hierarchical schematics, advanced multilayer board layouts, and end-to-end interconnects. Strong understanding of firmware and system diagnostics using BMC firmware, UEFI or BIOS, and Linux tools. Experience scripting and customizing diagnostics, validation, and test workflows. Experience with GPU supplier test code and open-source AI test and characterization tools. Experience with system integration, validation, and performance characterization. Strong understanding of high-speed buses and interconnects used in modern AI and compute platforms. Demonstrated ability to debug and root-cause complex hardware and software issues. Ability to document design intent and technical specifications clearly. Strong communication skills with the ability to explain complex technical topics across engineering teams and executive audiences. Proven ability to provide cross-functional technical leadership and collaborate effectively with internal teams and external partners. Preferred Skills Experience using hardware debuggers. Experience with PCIe, DDR, Ethernet, USB, SPI, and related interfaces. Experience with platform-level security technologies. Experience with power circuit design and signal integrity. Responsibilities Platform Architecture and Definition Participate in platform definition, architecture evaluation, and analysis for existing and next-generation Cloud AI platforms. Evaluate system architectures, proposed implementations, and scaling and optimization strategies. Review and assess third-party merchant silicon used for AI accelerator modules and GPU/CPU platforms. Balance hardware performance priorities against power, cost, regulatory, and cross-functional requirements. Platform Development and Oversight Drive definition, development, integration, debug, characterization, and tuning of AI hardware platforms. Provide platform development oversight for internal teams and third-party partners. Work with in-house engineering experts on design reviews, schematics, board layout, and implementation decisions. Document and specify design intent and technical details in collaboration with engineering teams. System Integration, Validation, and Performance Guide and support system integration, system test, qualification, and characterization. Define and oversee system validation plans, diagnostics features, and test strategies. Develop and expand system characterization and performance testing capabilities. Utilize supplier-provided and approved open-source AI platform qualification and test tools. Support definition of in-service system monitoring, error reporting, and operational health visibility. Cross-Functional and Partner Collaboration Collaborate with GPU and AI chip suppliers, system architects, firmware developers, and hardware engineers. Partner with storage, networking, compute, quality, security, cloud orchestration, and manufacturing teams. Support development program managers with technical assessments and planning. Assist manufacturing teams to ensure hardware is secure, robustly evaluated, and production-ready. Security, Support, and Operations Participate in hardware platform security evaluations. Guide internal teams and partners on scaling, monitoring, and deploying AI platforms into the cloud. Serve as a senior technical advisor to Oracle hardware, software, cloud, and support teams. Act as the final level of engineering support for complex deployed product issues. Assist with root-cause analysis through lab replication, remote debug, and cross-team collaboration. Qualifications Disclaimer: Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates. Range and benefit information provided in this posting are specific to the stated locations only US: Hiring Range in USD from: $120,100 to $251,600 per annum. May be eligible for bonus, equity, and compensation deferral. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity. Oracle US offers a comprehensive benefits package which includes the following: 1. Medical, dental, and vision insurance, including expert medical opinion 2. Short term disability and long term disability 3. Life insurance and AD&D 4. Supplemental life insurance (Employee/Spouse/Child) 5. Health care and dependent care Flexible Spending Accounts 6. Pre-tax commuter and parking benefits 7. 401(k) Savings and Investment Plan with company match 8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation. 9. 11 paid holidays 10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours. 11. Paid parental leave 12. Adoption assistance 13. Employee Stock Purchase Plan 14. Financial planning and group legal 15. Voluntary benefits including auto, homeowner and pet insurance The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted. Career Level - IC5

Requirements

  • 10 or more years of experience in hardware design, system engineering, and platform bring-up.
  • Hands-on experience with market-leading GPUs or AI platforms spanning development, bring-up, test, and characterization.
  • Strong knowledge of AI/GPU and or AI/CPU platform architectures and capabilities.
  • Experience evaluating system architectures, platform definitions, and implementation paths.
  • Ability to balance hardware performance, power, cost, regulatory, and cross-functional requirements.
  • Experience with modern server platforms across x86 and ARM architectures.
  • Hardware development experience at the system, board, and FPGA levels.
  • Proficiency reviewing hierarchical schematics, advanced multilayer board layouts, and end-to-end interconnects.
  • Strong understanding of firmware and system diagnostics using BMC firmware, UEFI or BIOS, and Linux tools.
  • Experience scripting and customizing diagnostics, validation, and test workflows.
  • Experience with GPU supplier test code and open-source AI test and characterization tools.
  • Experience with system integration, validation, and performance characterization.
  • Strong understanding of high-speed buses and interconnects used in modern AI and compute platforms.
  • Demonstrated ability to debug and root-cause complex hardware and software issues.
  • Ability to document design intent and technical specifications clearly.
  • Strong communication skills with the ability to explain complex technical topics across engineering teams and executive audiences.
  • Proven ability to provide cross-functional technical leadership and collaborate effectively with internal teams and external partners.

Nice To Haves

  • Experience using hardware debuggers.
  • Experience with PCIe, DDR, Ethernet, USB, SPI, and related interfaces.
  • Experience with platform-level security technologies.
  • Experience with power circuit design and signal integrity.

Responsibilities

  • Participate in platform definition, architecture evaluation, and analysis for existing and next-generation Cloud AI platforms.
  • Evaluate system architectures, proposed implementations, and scaling and optimization strategies.
  • Review and assess third-party merchant silicon used for AI accelerator modules and GPU/CPU platforms.
  • Balance hardware performance priorities against power, cost, regulatory, and cross-functional requirements.
  • Drive definition, development, integration, debug, characterization, and tuning of AI hardware platforms.
  • Provide platform development oversight for internal teams and third-party partners.
  • Work with in-house engineering experts on design reviews, schematics, board layout, and implementation decisions.
  • Document and specify design intent and technical details in collaboration with engineering teams.
  • Guide and support system integration, system test, qualification, and characterization.
  • Define and oversee system validation plans, diagnostics features, and test strategies.
  • Develop and expand system characterization and performance testing capabilities.
  • Utilize supplier-provided and approved open-source AI platform qualification and test tools.
  • Support definition of in-service system monitoring, error reporting, and operational health visibility.
  • Collaborate with GPU and AI chip suppliers, system architects, firmware developers, and hardware engineers.
  • Partner with storage, networking, compute, quality, security, cloud orchestration, and manufacturing teams.
  • Support development program managers with technical assessments and planning.
  • Assist manufacturing teams to ensure hardware is secure, robustly evaluated, and production-ready.
  • Participate in hardware platform security evaluations.
  • Guide internal teams and partners on scaling, monitoring, and deploying AI platforms into the cloud.
  • Serve as a senior technical advisor to Oracle hardware, software, cloud, and support teams.
  • Act as the final level of engineering support for complex deployed product issues.
  • Assist with root-cause analysis through lab replication, remote debug, and cross-team collaboration.

Benefits

  • Medical, dental, and vision insurance, including expert medical opinion
  • Short term disability and long term disability
  • Life insurance and AD&D
  • Supplemental life insurance (Employee/Spouse/Child)
  • Health care and dependent care Flexible Spending Accounts
  • Pre-tax commuter and parking benefits
  • 401(k) Savings and Investment Plan with company match
  • Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
  • 11 paid holidays
  • Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
  • Paid parental leave
  • Adoption assistance
  • Employee Stock Purchase Plan
  • Financial planning and group legal
  • Voluntary benefits including auto, homeowner and pet insurance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service