Private Cloud Infrastructure Architect

TEKsystemsChandler, AZ
Hybrid

About The Position

The Private Cloud Infrastructure Architect within IAS leads the architecture and operational governance of AI‑ready private cloud platforms, supporting LLM and agentic AI implementations with GPU and accelerated compute infrastructure. This role is accountable for ensuring platforms meet enterprise standards for security, observability, resilience, auditability, and cost governance. A core responsibility is ownership of the Platform API Inventory and Collection Interval Validation Matrix, ensuring all critical infrastructure, platform, observability, and cost telemetry is inventoried, validated, and collected at appropriate intervals. The architect brings hands‑on FinOps experience in a large financial institution, owning per‑platform telemetry retention audits that enable warm‑up recovery and rapid restoration of operational readiness following incidents, maintenance, upgrades, or disaster recovery events. We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company. We’re a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We’re a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We’re strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We’re building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.

Requirements

  • 10+ years of experience in infrastructure architecture, platform engineering, or private cloud engineering in large enterprise environments.
  • Proven experience designing and operating hybrid data center infrastructure, including virtualization, containers, storage, and networking.
  • Hands‑on experience with GPU and accelerated compute platforms, including capacity planning and operational management.
  • Demonstrated ownership of enterprise observability and telemetry programs, including API inventory and collection governance.
  • Direct FinOps experience in a large organization, including cost allocation, showback/chargeback, and infrastructure unit economics.
  • Strong understanding of resilience, recovery, and operational readiness (“warm‑up”) dependencies.
  • Excellent communication and stakeholder‑management skills across technical and non‑technical organizations.

Nice To Haves

  • FinOps Certified Practitioner or Professional
  • Experience supporting AI/ML or LLM platforms in regulated environments

Responsibilities

  • Define and govern end‑to‑end architecture for private cloud infrastructure supporting AI/ML, including GPU and accelerated compute platforms.
  • Design reference architectures, standards, and guardrails for hybrid data center environments (private cloud and on‑prem).
  • Lead architectural oversight of GPU platforms, including cluster design, scheduling, capacity planning, monitoring, and lifecycle management.
  • Own the Platform API Inventory and Collection Interval Validation Matrix across the AI ecosystem.
  • Define and govern observability strategies for metrics, logs, traces, cost, capacity, and model‑serving telemetry, including data quality controls.
  • Apply FinOps practices to enable showback/chargeback, cost allocation, unit economics, and consumption governance for AI infrastructure.
  • Own telemetry retention audits to ensure compliance, incident analysis, and support for resilience and warm‑up recovery use cases.
  • Partner across engineering, operations, finance, security, risk, and audit teams to influence outcomes and align priorities.

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service