Platform Engineering Architect - AI Automation

Clarity Innovations•Jessup, MD

About The Position

Clarity Innovations is a trusted national security partner, dedicated to safeguarding our nation’s interests and delivering innovative solutions that empower the Intelligence Community (IC) and Department of Defense (DoD) to transform data into actionable intelligence, ensuring mission success in an evolving world. Our mission-first software and data engineering platform modernizes data operations, utilizing advanced workflows, CI/CD, and secure DevSecOps practices. We focus on challenges in Information Warfare, Cyber Operations, Operational Security, and Data Structuring, enabling end-to-end solutions that drive operational impact. We are committed to delivering cutting-edge tools and capabilities that address the most complex national security challenges, empowering our partners to stay ahead of emerging threats and ensuring the success of their critical missions. At Clarity, we are people-focused and set on being a destination employer for top talent, offering an environment where innovation thrives, careers grow, and individuals are valued. Join us as we continue to lead innovation and tackle the most pressing challenges in national security. As a Platform Engineering Architect you will focus on operationalizing, securing, and maturing artificial intelligence capabilities by building and maintaining the AI "path-to-prod" and a scalable AI run stack—the AI “plumbing.” Your work centers on providing a unified interface for AI model access, integrating foundational and reasoning models, AI agents, and generative AI. Key responsibilities include contributing to the K8s-based AI access platform, managing deployment of core AI services, integrating frontier models (e.g., Claude, GPT) and local inference engines (e.g., vLLM) along with designing MLOps pipelines. The role also requires providing expert recommendations for enterprise AI adoption, specifically in agentic orchestration and spec-driven development.

Requirements

7+ years of combined experience in DevSecOps, Platform Engineering, or SRE.
Deep expertise in Administration and Development of Kubernetes clusters.
Advanced knowledge of Docker or equivalent container build tools.
Experience with Azure or AWS architecture and Infrastructure as Code (Terraform/Crossplane).
Proficiency in at least one backend or scripting language—ideally Python, Go, or Bash—to drive systems automation.
Solid understanding of OSI Layer 4–7, including VPC/VNET configuration, DNS, Load Balancing, and SSL/TLS management.
Practical experience with modern AI/ML frameworks and tooling such as PyTorch, Hugging Face, LangChain, vLLM, Ray, MLflow, or equivalent open-source ecosystems.
Hands-on experience deploying, scaling, and securing AI/ML workloads on Kubernetes, including GPU-enabled clusters, model-serving platforms, and distributed inference/training systems.
Experience building internal AI platforms or developer enablement tooling that supports model lifecycle management, experimentation, inference endpoints, and reproducible AI workflows.
Familiarity with MLOps concepts and tooling, including automated model deployment, versioning, evaluation, observability, rollback strategies, and CI/CD integration for AI systems.
Working knowledge of vector databases, embedding pipelines, retrieval-augmented generation (RAG), and semantic search architectures.
Understanding of AI security concerns including model isolation, data handling controls, prompt injection risks, supply-chain security, and governance requirements for sensitive or regulated environments.

Nice To Haves

Mastery of GitLab CI/CD (including Runners, Templates, and Security Scanners).
Hands-on experience with ArgoCD or equivalent declarative CD tools.
Proficiency with Helm for templating and deploying Kubernetes applications.
Possession of DoD 8570 certifications (e.g., Security+ or CASP+/SecurityX).
Experience with Service Mesh technologies (e.g., Istio, Cilium) and CNI plugins.
Configuration and management of Keycloak or OIDC/SAML providers.
Experience with the Grafana / Prometheus stack or similar.
Functional knowledge of PostgreSQL and MySQL.
Experience maintaining data pipelines or high-throughput data infrastructure.
Background in threat modeling, vulnerability management, or SOC operations.
Experience designing or maintaining RESTful or RPC APIs.

Responsibilities

Operationalizing, securing, and maturing artificial intelligence capabilities by building and maintaining the AI "path-to-prod" and a scalable AI run stack.
Providing a unified interface for AI model access, integrating foundational and reasoning models, AI agents, and generative AI.
Contributing to the K8s-based AI access platform.
Managing deployment of core AI services.
Integrating frontier models (e.g., Claude, GPT) and local inference engines (e.g., vLLM).
Designing MLOps pipelines.
Providing expert recommendations for enterprise AI adoption, specifically in agentic orchestration and spec-driven development.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume