Technical Operations Manager

Supernova TechnologyChicago, IL
59d$140,000 - $180,000

About The Position

Supernova Technology is seeking an experienced Technical Operations (TechOps) Manager to lead and evolve our infrastructure operations function. This role will oversee system administration, site reliability engineering (SRE), and infrastructure monitoring — ensuring our production environments are secure, scalable, and highly available. The ideal candidate is a hands-on technical leader with a passion for building world-class monitoring and alerting frameworks, improving system reliability, and driving operational maturity across environments.

Requirements

  • 7+ years of experience in Technical Operations, Infrastructure, or SRE roles, with at least 2+ years in a leadership or management capacity.
  • Proven success managing system administration, SRE, or DevOps teams in production environments.
  • Strong understanding of cloud infrastructure (AWS, Azure, or GCP), networking, and Linux system administration.
  • Hands-on experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, New Relic).
  • Solid grasp of automation, CI/CD pipelines, and Infrastructure-as-Code (e.g., Terraform, Ansible).
  • Deep knowledge of incident management processes and ITIL or reliability best practices.
  • Excellent communication, leadership, and collaboration skills.

Responsibilities

  • Lead and mentor the TechOps team responsible for infrastructure, SRE, and system administration functions.
  • Oversee cloud and on-premise environments, ensuring stability, scalability, and security.
  • Partner with engineering and security teams to design and implement reliable deployment and maintenance processes.
  • Define and enforce infrastructure standards, SLAs, and operational best practices.
  • Develop and maintain a robust monitoring and alerting strategy across all systems, applications, and services.
  • Implement tools and dashboards to provide visibility into system performance, uptime, and incident trends.
  • Drive continuous improvement in alert quality — minimizing noise while ensuring rapid detection of critical issues.
  • Establish and track key reliability metrics (e.g., uptime, latency, MTTR, MTBF).
  • Oversee incident response processes to ensure quick resolution, root-cause identification, and post-incident learning.
  • Implement reliability engineering principles to reduce operational toil and prevent recurrence of major incidents.
  • Collaborate with engineering teams on infrastructure scaling, redundancy, and capacity planning initiatives.
  • Develop and enforce operational runbooks, maintenance schedules, and change management processes.
  • Proactively identify and address potential points of failure in systems and processes.
  • Evaluate and adopt new tools that enhance monitoring, automation, and overall reliability.
  • Ensure system and infrastructure documentation is accurate and up to date.

Benefits

  • Medical, Dental, and Vision Insurance: Multiple plans with coverage for employees and dependents.
  • HSA and FSA Accounts: Tax-advantaged accounts for health and dependent care expenses.
  • Life and Disability Insurance: Employer-paid basic coverage with options for additional voluntary coverage.
  • Compensation: $140,000-$180,000
  • Retirement Savings: 401(k) plan with employer contributions.
  • Employee Assistance Program (EAP): Confidential support services, including free therapy sessions.
  • Paid Time Off: Flexible PTO policies.
  • Additional Perks: Commuter benefits, pet insurance, continuing education assistance, and more.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service