Role Overview: We’re looking for a hands-on technical leader to architect, fine-tune, and deploy on-device small language models (SLMs) for consumer security at scale. You’ll lead a focused team of 3–5 senior engineers while remaining deeply involved in the code and technical architecture. Your core responsibility is building high-performance, privacy-preserving AI models that run directly on user devices (Mac, iOS, Android, Linux). You’ll own model optimization, fine-tuning for tool-use accuracy, evaluation frameworks, and cost-aware deployment strategies. While you won’t own the agent orchestration platform itself, you’ll work closely with it to ensure models behave correctly in multi-turn conversations and make reliable tool-calling decisions. This role sits at the intersection of edge ML, applied LLMs, and production engineering. Success requires navigating real-world tradeoffs: latency vs. capability, privacy vs. accuracy, on-device vs. cloud execution, and cost vs. performance. This is not a traditional director role. You’ll spend 60%+ of your time on technical architecture and implementation, with the remainder focused on mentoring senior engineers and setting technical direction. This is a Hybrid remote position located in a hub location of Frisco, TX or San Jose, CA. You will be required to be onsite on an as-needed basis, typically 1-4 days per month. We are only considering candidates within a commutable distance to this location and are not offering relocation assistance at this time. About the role: Design and deploy small language models optimized for on-device inference (Mac, iOS, Android, Linux) Lead model optimization efforts including quantization, pruning, distillation, and efficient inference pipelines Fine-tune models to improve tool selection accuracy and conversational behavior in security-focused workflows Build evaluation frameworks to measure model efficacy, tool-calling accuracy, conversation quality, and safety in production Create synthetic data and workflow simulations to train and validate security-relevant conversations Partner closely with agent orchestration systems to optimize multi-turn dialogue behavior and state handling Implement cost-optimization strategies such as intelligent on-device vs. cloud routing, prompt caching, batching, and token efficiency Integrate cloud-based LLMs when deeper reasoning or broader context is required Build production ML systems that detect threats and protect users directly on-device Set technical standards and architectural direction for AI/ML across the security platform Mentor principal engineers and architects while remaining hands-on About you: 10+ years of software engineering experience, with 5+ years focused on ML/AI Proven experience shipping ML models to production with transferrable skills to deploy these on edge or mobile platforms Experience with conversational AI systems and tool/function-calling architectures Strong Python and systems programming skills (C++ or Rust) for performance-critical code Deep expertise in model optimization (INT4/INT8 quantization, pruning, distillation) Hands-on experience with PyTorch and at least one edge deployment framework (TensorFlow Lite, CoreML, ONNX Runtime, or llama.cpp) Experience building evaluation and benchmarking frameworks for ML systems Preferred: Experience applying ML systems in security, safety, or other adversarial domains Master’s degree in CS, ML, or a related field (or equivalent practical experience)
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Director
Number of Employees
1,001-5,000 employees