Site Reliability Architect

Qode•Texas, TX

16h

About The Position

We are seeking a highly experienced Senior Consultant / SRE Architect to lead the strategy, design, and implementation of enterprise-wide observability and reliability frameworks supporting business-critical transaction flows across distributed systems. In this role, you will act as a thought leader and architect, driving end-to-end design, architecture, and implementation of scalable, resilient, and secure cloud-native platforms on AWS.. You will partner with engineering, architecture, and business stakeholders to define standards, influence technical direction, and implement scalable observability solutions. This is a high-impact role focused on transforming SRE maturity, improving advisor experience, and enabling proactive, data-driven operations through modern observability practices. The ideal candidate is passionate about SRE, observability, and system design, with a proven ability to drive large-scale transformation initiatives.

Requirements

10+ years of experience in Site Reliability, Observability, Production Support, Cloud Architecture or related roles, with a strong focus on architecture and strategy
Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
Strong understanding of microservices architecture, APIs, and distributed systems
Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
Strong hands-on experience with AWS services, including: Compute & Networking: VPC, EC2, ECS/EKS, Lambda Databases: RDS, Aurora, DynamoDB Storage & CDN: S3, CloudFront Security: IAM, KMS, Security Groups, NACLs
Proven experience designing multi-account, multi-region AWS architectures
Deep understanding of: Cloud networking and distributed systems Security and compliance best practices Scalability, resiliency, and fault-tolerant design patterns
Hands-on expertise with Terraform (or similar IaC tools)
Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana, etc.)
Strong experience with DevSecOps principles and CI/CD pipelines
Excellent problem-solving and analytical skills
Demonstrated ability to lead cross-functional initiatives and influence technical direction

Nice To Haves

AWS Certifications (e.g., Solutions Architect – Associate or Professional )
Experience working in financial services, banking, or regulated environments
Background in Site Reliability Engineering (SRE) practices and production support models

Responsibilities

Design, architect, and build cloud-native infrastructure and application services on AWS
Lead end-to-end infrastructure design for application platforms, microservices, and shared services
Implement and manage Infrastructure as Code (IaC) using Terraform
Design and maintain highly available, scalable, secure, and cost-optimized AWS architectures
Troubleshoot and resolve complex infrastructure and application service issues
Provide architectural guidance and technical leadership across engineering teams
Drive adoption of DevSecOps best practices across the SDLC
Establish and enhance monitoring, observability, and alerting frameworks

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume