Sr. Machine Learning Engineer

Illumio•Sunnyvale, CA

51d•Onsite

About The Position

Illumio is the leader in ransomware and breach containment, redefining how organizations contain cyberattacks and enable operational resilience. Powered by the Illumio AI Security Graph, our breach containment platform identifies and contains threats across hybrid multi-cloud environments – stopping the spread of attacks before they become disasters. Recognized as a Leader in the Forrester Wave™ for Microsegmentation, Illumio enables Zero Trust, strengthening cyber resilience for the infrastructure, systems, and organizations that keep the world running. Our Engineering team is shaping the future of cybersecurity. We thrive on visionary leadership, autonomy, and ownership, fostering a culture of innovation that propels us forward in the ever-evolving cybersecurity landscape. As a leader in Zero Trust Segmentation, we are redefining security for a world facing unprecedented cyber threats. You’ll work with a highly scalable SaaS service built using cloud-native technologies while simultaneously shipping the solution on-premises. Our guiding philosophy in Engineering is to get things right through practicing disciplined engineering, focusing, not cutting corners, and of course having fun while we are at it. We believe in enabling ownership at all levels of the organization and empowering teams. If you thrive in this culture, come join us! As a Senior Software Engineer, you will architect high-scale distributed systems that process massive data volumes to fuel our Agentic AI ecosystem. You will lead the development of autonomous agents that don’t just provide analytics, but take action—driving complex automation and insights for our enterprise customers.

Requirements

5–8 years of experience in backend engineering using Java, Python, or Go.
Expertise in distributed systems, asynchronous architectures (Kafka), and large-scale data processing (Spark/Flink).
Hands on experience with agentic frameworks (e.g., AutoGen, CrewAI, or custom orchestration layers), RAG, MCP, fine tuning models and prompt engineering.
Agentic observability using Langfuse, Evals frameworks for Testing/Resilience

Nice To Haves

Expertise in building reusable Terraform modules and managing complex multi-region cloud deployments.
Deep experience in indexing strategies (HNSW vs IVF) and performance tuning for high-concurrency vector databases at scale.
Experience with LLM deployment optimization (e.g., vLLM, TensorRT-LLM) or managing proprietary model inference endpoints.

Responsibilities

Architect and optimize high-throughput, event-driven systems using Apache Kafka to handle real-time data flows.
Build and maintain large-scale data pipelines using Apache Spark or Flink to provide the high-volume analytics that power our AI.
Design sophisticated AI Agents capable of autonomous planning, memory management, and high-reliability tool-use across distributed environments.
Lead the architectural design of containerized services on Kubernetes, ensuring high availability and scalability across Cloud Infrastructure (AWS/Azure/GCP).

Benefits

This position involves access to software/technology that is subject to U.S. export controls. Any job offer made will be contingent upon the applicant’s capacity to serve in compliance with U.S. export controls

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume