Sr Engineer, Site Reliability

T-MobileAtlanta, GA

About The Position

Are you ready to join the Un-carrier movement? This role ensures the reliability and resilience of digital infrastructure to support highly critical Credit and Collections new project initiatives while continuously driving innovation. It involves automating processes and reducing manual effort to prevent operational incidents and improve system performance. The role requires expertise in programming, scripting, incident response management, and various technical tools to maintain system robustness. Success is measured by system stability, incident reduction, and continuous improvement in operational efficiency. The work directly impacts organizational stability and customer experience by maintaining high-performing and reliable systems.

Requirements

  • Bachelor's Degree plus 3 years of related work experience OR advanced degree with 1 year of related work experience OR combination of education and experience deemed equivalent (Required)
  • Acceptable areas of study include Computer Science, Engineering or related field (Required)
  • Programming Proficiency in programming and scripting languages such as Python and Bash. (Required)
  • Automation Ability to automate processes and reduce manual effort. (Required)
  • Incident Management Understanding of incident response management and operational support. (Required)
  • Experience with designing and maintaining CICD Pipelines. (Required)
  • Ability to learn new skills and technologies quickly and adapt to changing circumstances.
  • Understanding of system reliability and resilience principles.

Nice To Haves

  • 4-7 years Working in operations or develops environments (Preferred)
  • 4-7 years Troubleshooting customer related issues and managing customer relationships (Preferred)
  • 4-7 years Developing software solutions using Python or similar programming languages (Preferred)
  • Development and automation experience using Agentic AI and ML tools (preferred)
  • Familiarity with Billing and Credit business applications and platforms (preferred)
  • AWS Certified DevOps Engineer This certification validates technical expertise in provisioning, operating, and managing distributed application systems on the AWS platform. (Preferred)
  • Certified Kubernetes Administrator This certification validates the skills required for day-to-day administration of Kubernetes environments. (Preferred)
  • Google Cloud Certified - Professional DevOps Engineer This certification validates the ability to efficiently develop and deploy applications using Google Cloud technologies and to manage operations. (Preferred)

Responsibilities

  • Improve system reliability and resilience by identifying issues and implementing preventive measures to reduce downtime
  • Automate processes to accelerate software development and deployment while minimizing manual interventions using sophisticated agentic AI methods and tools
  • Design and maintain GitLab CI/CD pipelines to automate build, test, and deployment processes across multiple environments
  • Conduct root cause analysis and collaborate with problem management to prevent incident recurrence and improve system operations
  • Apply problem-solving and analytical skills to prevent operational incidents and maintain system stability
  • Leverage programming, scripting, and incident response expertise to improve system robustness and efficiency
  • Continuously learn new skills and technologies to adapt to changing environments and drive innovation
  • Also responsible for other duties/projects as assigned by business management as needed

Benefits

  • Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and access to free, year-round money coaches.
  • Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and access to free, year-round money coaches.
  • We cover all of the bases, offering medical, dental and vision insurance, a flexible spending account, 401(k), employee stock grants, employee stock purchase plan, paid time off and up to 12 paid holidays - which total about 4 weeks for new full-time employees and about 2.5 weeks for new part-time employees annually - paid parental and family leave, family building benefits, back-up care, enhanced family support, childcare subsidy, tuition assistance, college coaching, short- and long-term disability, voluntary AD&D coverage, voluntary accident coverage, voluntary life insurance, voluntary disability insurance, and voluntary long-term care insurance.
  • eligible employees can also receive mobile service & home internet discounts, pet insurance, and access to commuter and transit programs!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service