Machine Learning Engineer – Fine-Tuning and On-device AI

Hp Iq•Palo Alto, CA

About The Position

We are seeking a Machine Learning Engineer to lead the fine-tuning, optimization, and deployment of AI models for diverse tasks, with a strong emphasis on on-device inference . You will work on cutting-edge applications such as orchestration, planning, multi-agent coordination , and other intelligent decision-making systems. You will be responsible for adapting foundation models (LLMs, multimodal models) to specialized domains, making them fast, accurate, and efficient for resource-constrained environments—while ensuring robustness and safety.

Requirements

5+ years of experience in applied machine learning, including at least 3 years in LLM fine-tuning.
Proficiency in Python and ML frameworks ecosystem (HuggingFace, PyTorch).
Strong understanding of transformer architectures, attention mechanisms, and PEFT techniques.
Experience with on-device inference optimization (OpenVINO, ONNX, QNN).
Familiarity with orchestration/planning architectures and techniques for AI assistants.
Track record of delivering production-ready ML solutions in latency-sensitive environments.

Nice To Haves

Experience with multi-agent systems or AI assistant orchestration.
Familiarity with advanced inference optimization techniques such as KV cache paging , flash attention.
Knowledge about common inference engines, including but not limited to llama.cpp, vLLM.

Responsibilities

Model Fine-Tuning & Adaptation Fine-tune large language models, multimodal models, and task-specific models for orchestration, planning, and any other workflows as defined.
Design and run experiments to improve task accuracy, robustness, and generalization.
Explore and apply methods like full fine-tuning, LoRA, QLoRA and other types of parameter-efficient fine-tuning.
Employee advanced techniques such as QAT, DPO, GRPO to further improve the model quality.
On-Device Optimization Prune, quantize and compress models (e.g., INT8, INT4, mixed-precision) for CPU, GPU, NPU and edge accelerators.
Optimize models for low-latency inference using frameworks like OpenVINO, ONNX Runtime, QNN etc..
Data Pipeline & Deployment Build robust data pipelines for domain-specific datasets, including synthetic data generation and annotation.
Define evaluation metrics. Perform evaluations and analyze results.
Establish best practices for versioning, reproducibility, and continuous improvement of model performance.
AI Orchestration & Planning Develop and refine models to support multi-step reasoning, tool orchestration, and decision planning.
Work with stakeholders on orchestrator architecture.
Collaborate with product and research teams to design intelligent, context-aware assistant capabilities.

Benefits

Health insurance
Dental insurance
Vision insurance
Long term/short term disability insurance
Employee assistance program
Flexible spending account
Life insurance
Generous time off policies, including; 4-12 weeks fully paid parental leave based on tenure
11 paid holidays
Additional flexible paid vacation and sick leave