Infrastructure Services Director - TALON

Tyto Athene, LLCReston, VA
11dOnsite

About The Position

Tyto Athene is searching for a high-caliber Infrastructure Services Director to spearhead the establishment and operation of our high-performance AI R&D Lab/Data Center, our Technology Acceleration Lab for Operational Needs’ TALON. This strategic role is critical for delivering high-quality, self-service infrastructure that empowers our AI R&D teams to rapidly develop and test mission-oriented solutions, including advanced defensive and mission cyber AI technologies. This leader must blend strategic planning, deep technical expertise (HPC/GPU), an unyielding commitment to CMMC compliance, and a strong focus on Site Reliability Engineering (SRE) and DevOps principles to ensure secure, efficient, and reliable service delivery. A core mandate is to manage the Service Catalog and implement processes that allow developers to "go fast" while adhering to strict security and operational guardrails.

Requirements

  • 10+ years of experience with:
  • Core infrastructure operations: Windows/Linux, virtualization, storage, backups, and disaster recovery—standardized via infrastructure-as-code and live dashboards.
  • HPC cluster system administration, preferably in rapid AI and cyber solution prototyping environments.
  • State of the art GPU technologies and their integration into HPC environments (driver management, software stack tools, monitoring, workload scheduling)
  • Infiniband, NVLink, NVQLink, Spectrum-X (driver management, software stack tools, monitoring)
  • Container platforms (ex: Apptainer, docker, openshift, Kubernetes, EKS)
  • Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker
  • Slurm or other cluster schedulers, configuration and management solutions
  • NFS, SMB, and distributed object, file, and block storage management and configuration
  • High-performance parallel filesystem management and configuration
  • Installing and repairing servers and associated cluster hardware
  • CMMC:
  • Experience in devising a CMMC strategy and the successful attainment of a CMMC Level 3 accreditation for an AI powered R&D lab serving a government contractor

Responsibilities

  • Lead Data Center Hardware and Software Acquisition: Finalize labor requirements, and coordinate with OEMs, VARs, Software Vendors, and partners to build compute and transport infrastructure in TALON lab.
  • Operationalize Data Center: Oversee the delivery, receipt, installation, racking/stacking, configuration, integration and making infrastructure available for service
  • Manage TALON Data Center in Dulles, VA: Apply DevOps principles in operating, managing configurations, making entire service management lifecycle for all assets within the TALON on premise data center (including specialized GPU based infrastructure), remote nodes and cloud environments
  • Attain CMMC Accreditation for TALON environments: Establish and drive the plan to attain a CMMC accreditation for the TALON environments including data center, remote node and cloud environments, while future proofing the infrastructure strategy to embrace future needs such as NIPR/SIPR/JWICS interconnects. The plan will support needs for persisting and training models on customer provided data sets.
  • Cyber Network Strategy: Design, implement and operate network segments and associated infrastructure to securely meet the unique needs of TALON AI cyber projects, covering both defensive and mission cyber considerations
  • Serve as Technical Lead and administrator for TALON Data Center and TALON lab IT infrastructure
  • Maintain data center, audio visual, wifi, software, and all lab IT infrastructure
  • Cloud platforms: Plan, provision, and optimize AWS/Azure/GCP (compute, networking, IAM, cost control); enforce guardrails and landing zones. Experience managing IL 2/4/5/6 environments.
  • Networks & OT connectivity: Design and secure LAN/WAN/SD-WAN/Wi-Fi, firewalls. Must have experience managing NIPR and SIPR, and high-level knowledge of JWICS networks.
  • Cybersecurity & compliance: Implement zero-trust controls, patching, identity, logging/SIEM, and audit readiness (NIST/ISO). Implement CCMC standards.
  • Service management: Own the service catalog, SLAs, capacity planning, vendor contracts, and budget.
  • Facilities interface: Coordinate with facilities on power, cooling, UPS/generators, and physical security for server rooms.

Benefits

  • Health/Dental/Vision
  • 401(k) match
  • Paid Time Off
  • STD/LTD/Life Insurance
  • Referral Bonuses
  • professional development reimbursement
  • parental leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Director

Education Level

No Education Listed

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service