Site Reliability Architect

QodeTexas, TX

About The Position

We are seeking a highly experienced Senior Consultant / SRE Architect to lead the strategy, design, and implementation of enterprise-wide observability and reliability frameworks supporting business-critical transaction flows across distributed systems. In this role, you will act as a thought leader and architect, driving end-to-end design, architecture, and implementation of scalable, resilient, and secure cloud-native platforms on AWS.. You will partner with engineering, architecture, and business stakeholders to define standards, influence technical direction, and implement scalable observability solutions. This is a high-impact role focused on transforming SRE maturity, improving advisor experience, and enabling proactive, data-driven operations through modern observability practices. The ideal candidate is passionate about SRE, observability, and system design, with a proven ability to drive large-scale transformation initiatives.

Requirements

  • 10+ years of experience in Site Reliability, Observability, Production Support, Cloud Architecture or related roles, with a strong focus on architecture and strategy
  • Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
  • Strong understanding of microservices architecture, APIs, and distributed systems
  • Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
  • Strong hands-on experience with AWS services, including: Compute & Networking: VPC, EC2, ECS/EKS, Lambda Databases: RDS, Aurora, DynamoDB Storage & CDN: S3, CloudFront Security: IAM, KMS, Security Groups, NACLs
  • Proven experience designing multi-account, multi-region AWS architectures
  • Deep understanding of: Cloud networking and distributed systems Security and compliance best practices Scalability, resiliency, and fault-tolerant design patterns
  • Hands-on expertise with Terraform (or similar IaC tools)
  • Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana, etc.)
  • Strong experience with DevSecOps principles and CI/CD pipelines
  • Excellent problem-solving and analytical skills
  • Demonstrated ability to lead cross-functional initiatives and influence technical direction

Nice To Haves

  • AWS Certifications (e.g., Solutions Architect – Associate or Professional )
  • Experience working in financial services, banking, or regulated environments
  • Background in Site Reliability Engineering (SRE) practices and production support models

Responsibilities

  • Design, architect, and build cloud-native infrastructure and application services on AWS
  • Lead end-to-end infrastructure design for application platforms, microservices, and shared services
  • Implement and manage Infrastructure as Code (IaC) using Terraform
  • Design and maintain highly available, scalable, secure, and cost-optimized AWS architectures
  • Troubleshoot and resolve complex infrastructure and application service issues
  • Provide architectural guidance and technical leadership across engineering teams
  • Drive adoption of DevSecOps best practices across the SDLC
  • Establish and enhance monitoring, observability, and alerting frameworks
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service