Site Reliability Engineer

LogicMonitorAustin, TX
5hHybrid

About The Position

LogicMonitor® is the AI-first hybrid observability platform powering the next generation of digital infrastructure. LogicMonitor delivers complete visibility and actionable intelligence across on-premises, cloud, and edge environments. By anticipating issues before they strike, optimizing resources in real time, and enabling faster, smarter decisions, LogicMonitor helps IT and business leaders protect margins, accelerate innovation, and deliver exceptional digital experiences without compromise. Our customers love LogicMonitor's ability to bring cloud and traditional IT together into one view, as seen in minimal churn rates, expansion business, and exciting new customer references. In fact, LogicMonitor has received the highest Net Promoter Score of any IT Infrastructure Management provider. LogicMonitor also boasts high employee satisfaction. We have been certified as a Great Place To Work®, and named one of BuiltIn's Best Places to Work for the seventh year in a row! We are seeking a talented and experienced Site Reliability Engineer (SRE) to help ensure the uptime and reliability of our mission-critical systems. In this high-impact role, you’ll automate and streamline operational tasks, continuously looking for ways to improve performance, efficiency, and scalability. You’ll work closely with developers to provide infrastructure-focused feedback that enhances product performance within the LM environment. This is a unique opportunity to sharpen your SRE skill set and become an invaluable member for the core LM Operations team.

Requirements

  • 3+ years of experience in a Linux engineering role, preferably in a SaaS-based company.
  • Solid understanding of Linux system administration in distributed environments.
  • Experience with configuration management tools such as Chef, Puppet, or Ansible.
  • Experience with virtualization and container technologies (e.g., Docker, Kubernetes).
  • Programming/scripting experience (Python, Shell, Go).
  • Knowledge of security as it relates to Linux systems, applications, and networking.
  • High-level understanding of networking technologies, including routing, switching, firewalls, and iptables.
  • Able to work independently and self-direct projects.

Responsibilities

  • Maintain uptime of LogicMonitor’s SaaS-based platform and implement technical and process improvements to enhance system reliability.
  • Ensure the security and stability of the production environment through proactive monitoring and risk mitigation strategies.
  • Design, deploy, and manage scalable infrastructure and system integrations to support business growth and technical innovation.
  • Write code to automate infrastructure maintenance, deployments, and routine operational tasks to increase efficiency and reduce manual effort.
  • Partner closely with development teams to support and influence operational architecture and design changes.
  • Lead cross-functional, technically complex projects, driving execution and alignment across teams.
  • Act as a strategic technical resource across the organization, developing and delivering presentations for internal teams, customers, and external conferences.
  • Mentor junior team members, fostering growth, knowledge sharing, and operational excellence.
  • Set a high standard for documentation and runbook quality, leading by example to promote clarity, consistency, and operational readiness.

Benefits

  • Comprehensive health, dental and vision coverage
  • Generous parental leave policies
  • Access to our Employee Assistance Program and various Wellness programs
  • A 401K with company matching
  • A Lifestyle Spending Account
  • An unlimited vacation policy

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service