Site Reliability Engineer (SRE) - (AWS / Platform Support)

Diversified Services Network, Inc.Irving, TX
1dHybrid

About The Position

Diversified Services Network, Inc. (DSN) is seeking a full-time Site Reliability Engineer (SRE) – (AWS / Platform Support) to join our team in Chicago, IL OR Peoria, IL OR Irving, TX! We offer a HYBRID schedule, full benefits, PTO, 401k, and more! If you're looking to grow your technical career within an extremely reputable, stable Fortune 500 company - let's talk!

Requirements

  • Experience supporting production grade, customer facing platforms in complex, multi‑team environments
  • A demonstrated ownership mindset, taking accountability for service stability, incident outcomes, and follow through beyond initial investigation
  • Strong understanding of AWS Kinesis streaming and messaging services, containerized and serverless compute using Fargate and Lambda, and CI/CD pipeline implementation using Azure DevOps
  • Experience utilizing ServiceNow for incident management and Azure Devops for features, user stories, etc.
  • Proven ability to partner effectively with engineering, product, and platform teams to resolve issues and improve operational efficiency
  • Experience driving root cause analysis and continuous improvement, turning incidents into long term reliability gains
  • Strong understanding of operational readiness standards, including monitoring, alerting and runbooks
  • Comfort operating in on-call or escalation roles, maintaining composure and clear communication during high impact incidents
  • Ability to identify gaps in processes or tooling and proactively improve support models, documentation, or workflows
  • Experience working within enterprise ITSM frameworks
  • Strong communication skills, with the ability to translate technical issues into clear status and impact updates for stakeholders

Nice To Haves

  • Degree not required, but nice to have

Responsibilities

  • Own incident tickets through the full lifecycle, from initial triage to resolution and closure
  • Collaborate with engineering, platform, product, and operations teams to diagnose issues and coordinate fixes
  • Communicate incident status, impact, and resolution progress to stakeholders
  • Lead or contribute to root cause analysis and ensure follow up actions are identified and tracked
  • Ensure platform reliability through monitoring, alerting, security, and operational best practices
  • Respond to and manage production incidents impacting AWS services and APIs
  • Drive reliability, stability, and operational readiness improvements across cloud platforms
  • Understand end‑to‑end technical and business flows to support production services effectively
  • Develop, maintain, and improve clear, actionable runbooks for operational support
  • Lead knowledge transfer sessions to ensure support teams are ready for production support

Benefits

  • 401(k)
  • Dental insurance
  • Vision Insurance
  • Disability insurance
  • Employee assistance program
  • Health insurance
  • Health savings account
  • Life insurance
  • Paid time off
  • Paid Holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service