SRE Practice Architect II,

TEKsystemsDallas, TX
$148,200 - $222,400Remote

About The Position

Think of TEKsystems Global Services (TGS) as the growth solution for enterprises today. We drive growth through technology, strategy, design, execution, and operations, always with the customer first, for bold business leaders. We deliver cloud, data, and customer experience solutions. We partner with leading cloud, design, and business intelligence platforms to strengthen our expertise. We value deep relationships, dedication to serving others, and inclusion. We deliver positive outcomes for our people and our business, and we keep our commitments and act in line with our words. We create opportunities for people to find fulfillment through career success. Ready to join us? Here’s what the opportunity supported through our TGS Talent Acquisition Team requires: We are seeking a Principal Architect to lead the technical vision, architecture, and evolution of our Kubernetes-based infrastructure platforms supporting large-scale, GPU-enabled workloads. This role is responsible for defining end-to-end platform strategy across on-premises and cloud environments, driving architectural standards, and partnering closely with engineering, SRE, and operations teams to deliver highly reliable, scalable infrastructure-as-a-service to internal customers. As a Principal Architect, you will operate at both the systems and organizational level—setting long-term technical direction, influencing platform adoption, and ensuring architectural decisions align with business priorities, reliability goals, and future growth.

Requirements

  • 10+ years of experience in platform engineering, infrastructure architecture, SRE, or distributed systems roles.
  • Deep expertise in Kubernetes architecture, including on‑prem deployments and production-scale environments.
  • Strong understanding of Linux systems internals, performance tuning, and troubleshooting.
  • Advanced knowledge of networking fundamentals (L3/L4, DNS, load balancing, VPC networking).
  • Proficiency in one or more programming languages such as Go, Python, or Bash, with experience designing automation frameworks or platform services.
  • Proven experience in architecting large-scale distributed systems with high reliability requirements.
  • Demonstrated ability to influence technical direction across multiple teams without direct authority.
  • Bachelor’s degree in Computer Science or a related technical field (or equivalent experience).
  • AWS / AZURE / GCP
  • Python
  • Linux
  • Puppet / Chef / Ansible
  • Terraform
  • Docker/ Kubernetes
  • CI /CD (Automation, Metrics)
  • Observability (Datadog / Dynatrace / Sysdig / Aqua)
  • SEIM

Nice To Haves

  • Experience with GPU workload management and scheduling (Slurm, Run: AI).
  • Architecture experience supporting multi-cluster, multi-tenant Kubernetes at scale.
  • Familiarity with distributed storage systems such as Lustre or VAST Data, particularly in HPC or ML environments.
  • Experience designing platforms for observability at scale.
  • Contributions to open-source projects related to Kubernetes, cloud-native ecosystems, or HPC tooling.
  • Prior experience in environments where the majority of infrastructure is on‑prem, with hybrid cloud integration.

Responsibilities

  • Own the end-to-end architectural vision for Kubernetes platforms spanning on‑prem and cloud environments, with a strong emphasis on scalability, resiliency, and operational excellence.
  • Define and evolve reference architectures for: -Multi-cluster and multi-tenant Kubernetes environments -GPU-enabled workloads and high-performance computing (HPC) use cases Hybrid infrastructure (on‑prem + AWS)
  • Establish architectural standards, design principles, and best practices for platform services, networking, security, and observability.
  • Act as a technical authority and advisor across SRE, platform engineering, Cloud Foundations Automation (CFA), and service teams.
  • Lead architecture reviews and guide teams on complex design decisions involving distributed systems, networking, and workload orchestration.
  • Mentor senior engineers and architects, raising the overall technical bar of the organization.
  • Drive alignment across teams by translating business needs into scalable technical solutions.
  • Provide architectural oversight for: Kubernetes cluster lifecycle automation (provisioning, upgrades, scaling) CI/CD-driven application and platform deployments, Helm, Kustomize, and platform-as-code approaches.
  • Guide design decisions for workload managers and schedulers (e.g., Slurm, Run: AI) in GPU-heavy environments.
  • Influence strategies for infrastructure automation using Terraform and custom tooling.
  • Partner with SRE and SRO teams to ensure platforms meet availability, performance, and supportability targets.
  • Define observability architecture across metrics, logging, and tracing for complex distributed systems.
  • Ensure operational readiness by building an architecture that supports effective troubleshooting, runbooks, and handoffs to operations teams.
  • Enable internal customers by delivering infrastructure as a service that is reliable, well-documented, and easy to consume.
  • Support new strategic initiatives by architecting platforms and workflows for incoming projects.
  • Balance innovation with operational stability, ensuring architectural decisions scale with organizational growth.
  • Architect and evolve a centralized Kubernetes platform used by internal engineering teams.
  • Ensure successful onboarding and long-term reliability of services developed by the CFA team.
  • Enable rapid delivery of new projects by providing well-architected, production-ready infrastructure.
  • Serve as a strategic partner to SRE and SRO teams, ensuring platforms are operationally sound and supportable at scale.

Benefits

  • Medical, Dental, and Vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life and AD&D for employee and dependents)
  • Short and Long-Term Disability
  • Health Spending Account (HSA)
  • Transportation Benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service