Site Reliability Engineer

Smiley Technologies•Little Rock, AR

just now•Hybrid

About The Position

Site Reliability Engineer (SRE) Reliability, observability, and calm leadership when it matters most. Smiley Technologies powers core banking platforms used by banks and credit unions across the United States. When something goes wrong, it’s not just an alert - it’s real customers, real transactions, and real impact. We’re hiring a Site Reliability Engineer to replace a former senior SRE who recently moved into a leadership role. This is a mid-level SRE position, ideal for someone who already has hands-on SRE or Platform experience and is ready to grow into broader ownership over time. This role sits within our Platform / DevOps Engineering team and works across nearly every technical group in the organization. You’ll be hands-on with observability, incident response, CI/CD, and reliability practices - and you’ll help make life easier for engineers across Smiley. What Makes This Role Different This SRE is a key Incident Command contributor. Critical incidents are rare (only a few per year), but when they happen: You’ll run the call with clients Make sure the right people are present Keep communication clear, calm, and structured Capture notes and drive strong post-incident learning If you’re someone who stays composed under pressure and communicates clearly - this role will suit you well. The Technical Environment Hybrid infrastructure (on-prem + cloud) Azure-first (AWS experience welcomed) Containers: Docker, ACR, AKS Infrastructure as Code and automation-driven workflows Regulated, high-availability financial systems

Requirements

2+ years in an SRE, Platform Engineer, or DevOps role
Hands-on experience with APM / observability tools
Dynatrace strongly preferred
Datadog, New Relic, Prometheus/Grafana also relevant
Experience with Azure or AWS
Experience supporting CI/CD pipelines
Experience with containers (Docker, AKS)
Working knowledge of Git, Terraform, Helm, Bash, PowerShell
Experience supporting REST APIs
Experience with .NET or Python (or similar)

Nice To Haves

Linux/Unix administration fundamentals
Performance troubleshooting across:
Applications
Databases (SQL Server, DB2, PostgreSQL)
Familiarity with WAFs, networking, and OWASP concepts
Experience with developer portals (Backstage.io a plus)
Financial services or regulated environments (helpful, not required)

Responsibilities

Work cross-functionally with Network, SecOps, DevSecOps, Platform, Developers, and Support
Own and evolve observability and monitoring, primarily using Dynatrace
Dashboards, alerts, reporting, and adoption across teams
Help teams improve root cause analysis, retrospectives, and reliability practices
Support and improve CI/CD pipelines (GitHub Actions / Azure DevOps)
Maintain standards and documentation across the SDLC
Monitor and optimize cloud costs, infrastructure tiers, and capacity
Participate in Incident Command on-call rotation
Define and document SLIs, SLOs, SLAs, KPIs, and OKRs
Promote both Shift Left and Shift Right reliability thinking
Help Smiley continue its DevOps and Platform transformation