Production Systems Engineer

MetaMenlo Park, CA
10h$144,000 - $204,000

About The Position

Meta is seeking a highly skilled and experienced Systems/Production Systems Engineer to join our Release to Production (RTP) team. The RTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping, pre-production hands-on system validation, hardware debugging, enabling production-ready system monitoring, automated provisioning, and automated remediation of issues. As a Systems/Production Systems Engineer, you will work closely with various teams, including HW/SW co-design teams, hardware designers, networking teams, system manufacturers, component vendors, capacity engineering, production engineering, production services, and data center operations teams to enable new systems that will be deployed in our production data centers.

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 6+ years of experience in hardware server system support, knowledge of server architecture and components
  • Experience with Linux and scripting (Python or similar)
  • Experience in changing system configurations and measuring change impact
  • Experience working in a matrix organization
  • Experience working with different server system/data center products
  • Demonstrated problem-solving skills, with experience in troubleshooting complex technical issues
  • Demonstrated communication and collaboration skills, with a track record of working effectively with cross-functional teams

Nice To Haves

  • Demonstrated experience in contributing to data-driven decarbonization efforts for hardware development by applying analytics and modeling to support reductions in data center rack emissions
  • Proven track record of supporting the design and implementation of Net Zero initiatives within Meta Infrastructure Hardware, including developing metrics, dashboards, and predictive models to monitor progress and inform team decisions
  • Skilled in collaborating with cross-functional teams (Infrastructure Hardware, Sustainability, Capacity Planning, Data Center Operations, Network Infrastructure, Finance, and Sourcing) to help align on decarbonization solutions and program execution across multiple locations
  • Proficient in building and maintaining scalable data pipelines to integrate emissions, energy, and operational data from different sources, enabling real-time monitoring and actionable insights
  • Experience in applying statistical, machine learning, and reliability modeling techniques to analyze emissions impacts, detect anomalies, and recommend improvements for hardware and infrastructure sustainability
  • Experience in advancing sustainability analytics by refining Key Performance Indicators, automating reporting, and supporting Meta’s Net Zero goals through innovative data science solutions in partnership with engineering and business teams

Responsibilities

  • Interface with external vendors and internal hardware, mechanical, power, thermal, manufacturing, and software engineers to understand system architecture and develop test suites for various architectures
  • Proactively create experiments and tooling to detect and diagnose hardware/firmware/software health issues
  • Develop test frameworks for large-scale test automation inside the fleet during product development and after mass production
  • Implement remediations across software and hardware stacks according to plan, while keeping thorough procedural records and data logs
  • Troubleshoot, diagnose, and root cause system failures, isolating components/failure scenarios while working with internal & external stakeholders
  • Develop visibility through data visualization and implement systemic solutions to hardware health issues
  • Drive necessary discussions with external and internal teams on test specification and methodologies to improve test quality continuously
  • Contribute to Meta's 2030 Net Zero targets by evaluating sustainability, carbon footprint of new hardware design, and infrastructure design
  • Partner with Net Zero teams to implement strategies across infrastructure for reuse, recycling, energy-aware computing, and quality practices for deployed and decommissioned hardware

Benefits

  • bonus
  • equity
  • benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service