AI Platform Engineer

General Dynamics Information Technology•USA FL MacDill AFB - MacDill AFB (FLC007), FL

1d•Onsite

About The Position

As an AI Platform Engineer (LLM & MLOps), the work you’ll do at GDIT will be impactful to the mission of USCENTCOM. You will play a crucial role in the design, deploy, and operate secure, scalable AI inference and orchestration platforms supporting USCENTCOM’s Data Analytical Environment (DAE) and AI environment. This role focuses on platform reliability, workflow stability, and operationalizing commercial LLMs in on-premises and hybrid environments. The engineer will work with GPU-enabled Kubernetes clusters, model serving frameworks, vector databases, and secure APIs to enable Retrieval-Augmented Generation (RAG) and agent-based AI workflows. This position does not focus on model training or AI research; instead, it emphasizes execution, integration, and platform resilience. This role supports the evolution of enterprise AI capabilities from foundational platforms to reusable, governed agent-based services.

Requirements

Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent experience
DoW Directive 8140 compliant
8+ years of related experience
Strong experience with Kubernetes, containerization (Docker/Podman), and GPU scheduling.
Hands-on experience deploying LLM inference services (commercial or open-source).
Proficiency with Python and API development for platform services.
Experience integrating vector databases (e.g., FAISS, Milvus, Weaviate, OpenSearch).
Familiarity with MLOps toolchains (MLflow, CI/CD pipelines, artifact registries).
Experience operating systems in secure DoW environments.
Knowledge of monitoring/logging stacks (Prometheus, Grafana, ELK/Loki).
Active Secret clearance required; TS/SCI preferred or eligible
US citizenship required

Nice To Haves

Experience with RAG or agent-based AI architectures.
Familiarity with Kubernetes-native workflow engines (Argo, Kubeflow).
Exposure to cost tracking or usage metering for shared compute platforms.
Understanding of DoW AI governance, ethical AI, and responsible deployment.

Responsibilities

Design, deploy, and maintain GPU-enabled Kubernetes environments for AI inference and orchestration.
Operationalize commercial LLM inference services using frameworks such as Text Generation Inference (TGI), KServe, FastChat, Triton, or similar.
Integrate vector databases and knowledge repositories to support RAG and graph-augmented LLM workflows.
Build and maintain secure REST APIs for AI job submission, inference requests, and workflow orchestration.
Implement MLOps and platform lifecycle practices, including model versioning, containerization, CI/CD, and reproducibility.
Enforce multi-tenant isolation, RBAC, namespace quotas, and resource controls across teams.
Implement monitoring, logging, and alerting for AI services, GPU utilization, and workflow health.
Support secure deployment in air-gapped, on-prem, and hybrid environments, adhering to DoW security requirements.
Collaborate with platform, automation, and data teams to align AI capabilities with mission workflows.
Support prompt, rule, and heuristic-based agents by ensuring reliable inference, retrieval, and context delivery.
Maintain conversation-aware context pipelines used for tagging and classification agents.

Benefits

Comprehensive benefits and wellness packages
401K with company match
Competitive pay
Paid time off
Full flex work weeks where possible
Variety of paid time off plans, including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave.
Short and long-term disability benefits
Life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume