About The Position

The Senior Site Reliability Engineer at T-Mobile plays a crucial role in enhancing system reliability and resilience, facilitating faster and more efficient software development and deployment. They utilize their strong problem-solving and analytical skills to automate processes, reducing manual effort and preventing operational incidents. Their expertise in programming and scripting languages, incident response management, and various tech tools contributes to the robustness and efficiency of our systems. By continuously learning new skills and technologies, they adapt to changing circumstances and drive innovation. Their work and expertise contribute significantly to the stability and performance of T-Mobile's digital infrastructure. The Sr Site Reliability Engineer is responsible for ensuring stability, security, and performance of Network Supply Chain ecosystem including SC Digital, o9 Order Management & Planning Systems, Network Asset Lifecycle Management system, SAP ERP system, and 3PL Systems. While this position includes hands-on pro-active maintenance, it also focuses majority of the time on technical leadership, which includes driving the vision to ensure stability and security, proposing the optimal solution for bringing efficiencies, coaching engineers and junior team members to follow best practices and, participating in advanced troubleshooting of production and pre-production systems. They own production as well as non-production environments and actively collaborate on architectural, technological, and infrastructural discussions for both current ecosystem and future strategy.

Requirements

  • 4-7 of progressive experience in software engineering/maintenance across multiple products, systems and/or platforms coupled with strong business acumen.
  • 4-7 years of experience in Enterprise applications, middle-tier services, database, storage, distributed computing, virtualization and/or application technology.
  • Experience working in an Agile and DevOps environment.
  • Experience in one or more of: JavaScript, Java, .Net, API Gateway, MongoDB, Oracle, Springboot, Angular, etc.
  • Experience in Continuous Integration/Continuous Delivery tools, such as, Jenkins, Cloudbees, etc., and other automation tools.
  • Experience with DevOps tools, such as, Ansible, Chef, Puppet, etc. Experience in Docker, Kubernetes, and Deep.io is preferable.
  • Experience in APM tools, like AppDynamics, oTel, and Splunk.
  • Incident Management Understanding of incident response management and operational support. (Required)
  • Experience with designing and maintaining CICD Pipelines. (Required)
  • Bachelor's Degree Computer Science, Engineering or related field (Preferred)
  • Master's/Advanced Degree Computer Science, Engineering or related field (Preferred)
  • 4-7 years - Working in operations or develops environments
  • 4-7 years - Troubleshooting customer related issues and managing customer relationships
  • 4-7 years - Developing software solutions using Python or similar programming languages

Nice To Haves

  • Cloud - AWS/Azure
  • This certification validates technical expertise in provisioning, operating, and managing distributed application systems on the AWS platform. (Preferred)
  • Certified Kubernetes Administrator
  • This certification validates the skills required for day-to-day administration of Kubernetes environments. (Preferred)
  • Telemetry - OTel, Argos, Splunk
  • This certification validates the ability to efficiently develop and deploy telemetry using T-Mobile's preferred tool such as OTel, Argos, or Splunk. (Preferred)

Responsibilities

  • Utilizes fluent knowledge and skill in emerging DevOps-centric automation tools and technologies for CI/CD, configuration management, etc. for non-prod environments.
  • Manages Network Supply Chain production and non-production environments for SC Digital Layer, o9 Planning and Order Management systems, Network Asset Lifecycle Management system (CATS and SiteHound), SAP ERP systems, and 3PL Systems.
  • Performs environment management, automated server provisioning, pipeline configuration (VMs).
  • Delivers software to improve the availability, scalability, latency, and efficiency of T-Mobile's services.
  • Creates, manages, and uses dashboard for continuous monitoring and health check of applications, and the underlying infrastructure, improves the quality of services using the monitoring feedback for non-production environment.
  • Contributes to future improvements of software delivery processes and operations, e.g., cloud enablement, and use of microservices with containerization.
  • Relationship and People Management: Mentors/guides other Systems Reliability Engineers and vendor resources as needed.
  • Also responsible for other Duties/Projects as assigned by business management as needed.

Benefits

  • All team members receive a competitive base salary and compensation package - this is Total Rewards.
  • Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and access to free, year-round money coaches.
  • We cover all of the bases, offering medical, dental and vision insurance, a flexible spending account, 401(k), employee stock grants, employee stock purchase plan, paid time off and up to 12 paid holidays - which total about 4 weeks for new full-time employees and about 2.5 weeks for new part-time employees annually - paid parental and family leave, family building benefits, back-up care, enhanced family support, childcare subsidy, tuition assistance, college coaching, short- and long-term disability, voluntary AD&D coverage, voluntary accident coverage, voluntary life insurance, voluntary disability insurance, and voluntary long-term care insurance.
  • eligible employees can also receive mobile service & home internet discounts, pet insurance, and access to commuter and transit programs!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service