Production Systems Engineer

Meta Platforms, Inc.Menlo Park, CA
34d

About The Position

Meta is seeking a Production Systems Engineer to join our Hardware Design and Release to Production (HDRTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver Meta's services globally. The HDRTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, from exploration and development to production health. HDRTP Engineers work closely with Production Engineering teams, Enterprise Networking, Hardware Designers, Networking Teams, Manufacturers, Vendors, Datacenter Operation teams and New Product Introduction teams to ensure the smooth operation of systems across the planet. We encounter problems from the very smallest of scales (errors occurring at the microscopic scale, within single registers of a CPU) up to the very largest - deploying solutions to Meta's millions of devices globally. We focus on finding solutions to complex issues, embracing ambiguity, driving impact, and tackling the hardest problems in the domain.

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 6+ years experience coding in a higher-level language (Python, PHP, Java, Go, Rust, C++)
  • Experience building, maintaining and debugging production services or platforms - usually (but not necessarily) in a linux/unix environment
  • Knowledge of server architecture and components across Compute/Storage/AI Systems/Networking

Nice To Haves

  • Experience managing and debugging hardware platforms in a cloud environment
  • 6+ years experience coding in a higher-level language (Python, PHP, Java, Go, Rust, C++)
  • Demonstrated experience driving projects to successful business outcomes

Responsibilities

  • Build and develop tooling solutions to automate business critical processes in service of managing the health of the Meta production hardware fleet
  • Troubleshoot, diagnose and root cause system failures, working with key partners to identify and deliver solutions
  • Proactively identify opportunities to fix or enhance tooling, hardware and processes
  • Build subject matter expertise in one or more of the specialist areas covered by the RTP (Release To Production) team
  • Scientific approach to troubleshooting, root-cause analysis and investigation

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Broadcasting and Content Providers

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service