Data Center Engineer

Etched•San Jose, CA

4h•$130,000 - $210,000•Onsite

About The Position

Deploying next-generation inference hardware at scale requires more than great chips - it demands world-class physical infrastructure. As a Data Center Engineer at Etched, you will own the end-to-end lifecycle of our data center and hardware lab environments: from facility selection and rack design, power distribution, networking layout, cabling, hardware management, day-to-day operations, and long-term capacity planning. You'll work directly with the hardware, platform, and software teams to bring Sohu systems online faster, keep them running harder, and push the limits of what dense, high-power AI inference design and manufacturing infrastructure can do. This is not a traditional data center operations role. We expect you to treat data center engineering with the same rigor and craftsmanship we apply to our chip design - thinking from first principles about power density, thermal constraints, network topology, and physical security. You'll be making real architectural decisions that directly shape how our products are engineered, manufactured and reach customers. You will be on the ground in our co-location facilities and lab environments, working hands-on with custom server platforms and high-speed networking. You'll drive the processes, tooling, and vendor relationships that allow us to scale our infrastructure as fast as our product roadmap demands.

Requirements

5+ years of hands-on data center engineering or operations experience, with direct responsibility for physical hardware deployment, power architecture, and facility management.
Designed and deployed high-density compute environments (20 kW/rack and above) and have first-hand experience managing the thermal and power challenges that come with them.
Deeply comfortable with structured cabling, fiber and copper plant management, and high-speed networking hardware at scale.
Can read and interpret electrical one-line diagrams, raised-floor and hot-aisle/cold-aisle plans, and co-location facility documentation without assistance.
Built or operated monitoring and DCIM tooling and treat infrastructure visibility as a non-negotiable property of any environment you own.
Strong vendor manager - you know how to write an RFP, run a competitive evaluation, and hold a co-lo or hardware vendor to their commitments.
Thrive in fast-moving environments where requirements shift quickly and you need to make confident decisions with incomplete information.
Driven by ownership and take pride in environments that are clean, documented, and operationally excellent - not just ones that are "up."

Nice To Haves

Physical deployment and bring-up of custom or semi-custom server platforms, including early-stage hardware that doesn't come with vendor support.
Liquid cooling systems (direct liquid cooling, rear-door heat exchangers, or immersion cooling) and the facility requirements they impose.
AI or HPC cluster environments - GPU or ASIC clusters, high-radix switch fabrics, RDMA networking.
Scripting and automation (Python, Bash, Ansible) for asset tracking, environmental monitoring integration, or operational workflows.
Working within a semiconductor or hardware startup, where roadmaps compress and infrastructure needs to keep pace with silicon.

Responsibilities

Own rack layout, capacity planning, power distribution, network design, cabling, and physical deployment of Etched High performance computing platforms across data center and hardware lab environments.
Design and manage power distribution and redundancy architectures — from utility feeds and PDUs to per-rack power and cooling budgets - for high-density AI compute deployments pushing 90 kW per rack.
Collaborate with physical infrastructure and facilities teams, as well as external vendors to build and manage highly sophisticated hardware labs used for bring-up, EVT, and customer demos.
Partner with co-location vendors and internal teams to evaluate sites, negotiate contracts, and enforce SLAs around power, cooling, physical security, and network connectivity.
Architect and implement high-speed networking infrastructure (100G/200G/400G Ethernet) connecting compute nodes, storage, and upstream peering, in coordination with network and platform engineering teams.
Develop and maintain asset management systems, rack diagrams, and change control processes to ensure full visibility into physical infrastructure state at all times.
Build and operate monitoring and alerting for environmental health (temperature, humidity, power draw, UPS state) and drive rapid response to hardware and facility incidents.
Define and execute preventive maintenance schedules and hardware lifecycle processes, including RMA coordination with vendors and on-site repair.
Lead capacity planning cycles in lockstep with the hardware roadmap, forecasting power, space, and network needs 6–18 months out and translating those forecasts into facility agreements and procurement plans.
Establish and enforce physical security procedures, access control policies, and audit trails across all data center sites.

Benefits

Medical, dental, and vision packages with generous premium coverage
$500 per month credit for waiving medical benefits
Housing subsidy of $2k per month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more
Daily lunch and dinner in our office
Unlimited compute budget subject to ROI justification

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume