Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history. Etched’s Inference SW team enables optimal mapping of models to Sohu’s dataflow architecture and serving requests across multiple chips, hosts and racks. We are seeking a highly skilled and motivated engineer to formalize and optimize our collectives (e.g. Send/Recieve, AllReduce, Broadcast, etc.). You’ll build SW enabling frontier inference performance to satisfy exponentially growing serving demand. In this role, your core focus will be working across systems and research to realize Mixture of Expert (MoE) architectures on Sohu’s system. You will play a key role in scaling out Sohu’s nascent runtime, with a focus on collectives.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
51-100 employees