Sr. Staff AI Engineer - On-Prem AI Infrastructure & Agentic Systems

SK hynix memory solutions America Inc.•San Jose, CA

11h•$140,000 - $165,000•Onsite

About The Position

We are seeking a hands-on AI Engineer to design, deploy, and maintain on-prem AI infrastructure and build agentic AI systems that drive real-world automation. You’ll be responsible for setting up scalable AI environments, implementing RAG pipelines, fine-tuning embedded models, and architecting AI agents that operate autonomously in enterprise settings. This role sits at the intersection of AI systems engineering and applied ML — you’ll bridge infrastructure, model deployment, and agent logic.

Requirements

2+ years of experience in AI/ML engineering, with hands-on deployment of AI systems on-prem or private cloud.
Proven experience building agentic AI systems — including state management, tool integration, and multi-step reasoning.
Strong working knowledge of RAG architectures — chunking, retrieval, re-ranking, evaluation metrics.
Experience with model fine-tuning (LoRA, QLoRA, full fine-tuning) and embedding models for retrieval.
Familiarity with Model Control Protocols (MCP) or similar governance frameworks (model versioning, access control, audit trails).
Proficiency in Python, Linux, Docker/Kubernetes, and vector databases (e.g., Milvus, Qdrant, Pinecone).
Experience with AI serving frameworks (vLLM, TGI, Triton, Ollama, etc.).

Nice To Haves

Experience deploying AI in enterprise storage or hardware-adjacent environments.
Background in systems engineering or QA automation — bonus if you’ve used AI to automate testing or validation.
Familiarity with embedded AI or edge inference (ONNX, TensorRT, GGUF, etc.).
Experience with AI agent frameworks (LangGraph, AutoGen, BabyAGI, etc.).
Knowledge of AI observability tools (LangSmith, Weights & Biases, Prometheus/Grafana for AI).
As a Storage company, knowledge of storage area/NVMe is a PLUS.

Responsibilities

Design and deploy on-prem AI infrastructure — including GPU clusters, model serving (e.g., vLLM, TGI, Triton), vector DBs (e.g., Milvus, Qdrant, FAISS), and orchestration (Kubernetes, Helm, Docker).
Build and optimize RAG pipelines — including document chunking, retrieval strategies (hybrid, re-ranking), and evaluation of retrieval accuracy and latency.
Develop agentic AI systems — design stateful agents with memory, tool use, and planning capabilities (e.g., using LangGraph, AutoGen, or custom frameworks).
Fine-tune and deploy embedded models — work with LoRA, QLoRA, or full fine-tuning for domain-specific tasks; optimize for edge/on-device inference.
Implement Model Control Protocols (MCP) — ensure model governance, versioning, access control, and monitoring for production AI systems.
Collaborate with product and engineering teams to integrate AI capabilities into enterprise workflows — especially in storage, QA, or systems engineering contexts.
Automate and monitor AI pipelines — build CI/CD for model deployment, logging, and performance tracking.