Software Engineering SMTS

SalesforceIndianapolis, IN
1d

About The Position

The Site Reliability Engineering team is part of the Digital Enterprise Technology Platform Engineering organization, responsible for maintaining and developing the IT monitoring and log analytics platform that ensures Enterprise IT services' reliability. We're looking for a self-starter with the ability to take ownership of tasks, work under pressure, and balance multiple assignments simultaneously while maintaining a positive outlook. You'll contribute ideas and provide feedback on IT monitoring systems' vision while providing expertise for IT projects and enhancements across various IT organizations.

Requirements

  • Bachelor's degree in Computer Science or related technical field, or equivalent experience in technical leadership
  • 5-8 years of experience designing and implementing distributed systems to handle large-scale telemetry and log data
  • Demonstrable ability in Bash/Powershell, Python, and JavaScript (NodeJS), especially program comprehension
  • Understanding of REST-based API design principles and best practices
  • Experience with server administration (Linux and Windows)
  • Knowledge of monitoring tools like Zabbix, Splunk, Grafana, NewRelic, or ThousandEyes
  • Experience with AWS public cloud and VMware vSphere
  • Knowledge of configuration management and orchestration tools like Puppet, Ansible, or Terraform
  • Experience with Docker and containerized applications
  • Strong troubleshooting and debug skills (reading log files, analyzing memory leaks)
  • Strong analytical skills and ability to gather and synthesize data for review
  • Ability to problem-solve in a fast-paced environment and shift gears effectively
  • Subject matter expertise in at least one monitoring and telemetry product

Nice To Haves

  • Experience with AI and machine learning applications in operations
  • Experience with predictive monitoring and auto-healing solutions
  • Master's degree in Computer Science or related field
  • Experience translating technical concepts into visual representations

Responsibilities

  • Manage, assess, plan, and support core observability platform operations
  • Lead process changes and implementations related to the monitoring platform
  • Provide escalation support for configuration and platform issues, participating in on-call schedules to resolve major incidents
  • Collaborate with key stakeholders (Service Managers, Product Managers, Application Architects, Business Support, and Operations) to gather and develop requirements
  • Develop AI, automation, and integrations to deliver custom monitoring requirements
  • Work with third-party vendors and partners to address platform-related enhancements
  • Support and manage the introduction of new monitoring tools and orchestrate migrations as aging software is retired
  • Present reports on monitoring event metrics and correlation metrics to the Enterprise Operations team periodically
  • Work under Agile scrum methodology and provide guidance to junior team members
  • Create standard operating procedures and share them with the team for effective execution

Benefits

  • time off programs
  • medical, dental, vision, mental health support
  • paid parental leave
  • life and disability insurance
  • 401(k)
  • employee stock purchasing program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service