Forward Deployment Engineer (Inference & RL POC)

Glint Tech Solutions•Mountain View, CA

1d•Hybrid

About The Position

We're looking for a Forward Deployment Engineer (FDE) to work directly with customers and partners to design, deploy, and validate inference and reinforcement learning (RL) proof-of-concepts on GMI's GPU infrastructure. This is a high-impact, hybrid engineering role that sits at the intersection of platform engineering, applied ML, and customer success. You'll be embedded with customers during early-stage deployments—turning research ideas, datasets, and business requirements into working, performant systems on real GPU clusters. If you enjoy being close to users, debugging real systems, and shipping results fast (not just writing docs), this role is for you.

Requirements

Strong software engineering background (Python required; Go / Rust a plus)
Hands-on experience with ML inference or training systems
Familiarity with distributed systems and GPUs (multi-GPU, multi-node)
Comfort working directly with customers and ambiguous requirements
Ability to debug end-to-end systems (code, infra, networking, performance)

Nice To Haves

Experience with: LLM inference frameworks (vLLM, SGLang, Ray Serve, Triton, etc.)
RL or post-training workflows (RLHF, RFT, SFT)
PyTorch, DeepSpeed, Megatron-LM, or similar
Kubernetes-based ML platforms
GPU performance profiling and optimization
Prior experience as: Forward Deployed Engineer Solutions Engineer ML Platform Engineer Applied Research Engineer

Responsibilities

Own customer POCs end-to-end
Deploy and optimize LLM inference , RL training , and post-training workflows on GMI clusters
Translate customer requirements into concrete system designs and experiments
Forward-deploy with customers
Work hands-on with research teams, startups, and enterprise customers
Debug performance, stability, and correctness issues in real environments
Inference deployment
Stand up and tune inference stacks (e.g. vLLM / SGLang / Ray Serve–style architectures)
Optimize latency, throughput, GPU utilization, and cost efficiency
RL & post-training POCs
Support RLHF / RFT / SFT workflows using customer-provided datasets
Integrate SDKs, training APIs, and cluster resources to shorten idea experiment cycles
Performance & reliability
Diagnose GPU, networking, and distributed system bottlenecks
Run benchmarks, profiling, and stress tests on multi-GPU / multi-node setups
Feedback loop to product
Feed real-world customer learnings back into GMI's platform, SDKs, and APIs
Help shape reference architectures, cookbooks, and best practices