Customer Reliability Engineer

iManageChicago, IL
$103,000 - $159,000Hybrid

About The Position

Being a Customer Reliability Engineer at iManage means you’re a data-driven problem solver with experience in triaging issues at scale. You are an expert at explaining complex technical problems to a variety of audiences. You are a passionate customer advocate and work proactively to anticipate and address or raise visibility to problems in a data-driven way. You will be obsessed with uptime. When production breaks, you lead the charge. If it hasn’t broken yet, you’re already building the monitors, queries, and automation that make sure it doesn’t. The Customer Reliability team is expert in consumption and reliability of the iManage Cloud platform across services and interfaces (first party & partner). They guide the way we anticipate, communicate, and react to reliability issues while driving iterative improvements across products & services both internally and externally. When there are multi-faceted reliability problems which don’t fit existing constructs, they engage as a customer advocate to guide data-driven decision making and holistic improvements across the platform based on lessons learned.

Requirements

  • 3–5 years’ experience in a technical escalation role in a Support, CRE, Development, or SRE function.
  • Proven ability to understand and communicate highly complex issues for non-technical audiences, including executive stakeholders.
  • Seasoned incident responder with hands-on experience leading P1/P2 response, running postmortems, and driving systemic fixes that prevent recurrence.
  • Experience troubleshooting and supporting distributed cloud services.
  • Strong technical proficiency across SQL, Python, Bash/Shell, PowerShell, and REST APIs.
  • Solid understanding of Azure Kubernetes Service (AKS) and related Azure services.
  • Deep knowledge of observability and support platforms such as Splunk, Grafana, Kibana, and Prometheus.
  • Strong technical troubleshooting and problem-solving skills; ability to think through situations outside the norm and develop appropriate solutions.
  • Proven track record of driving automation and self-service improvements — a bias toward building, not just operating.

Responsibilities

  • Developing and maintaining a deep technical knowledge of iManage platform services, with the ability to personally deep dive into logs, queries, and infrastructure to unblock different teams.
  • Collaborating with customer-facing, product, and infrastructure teams on the development and deployment of scalable, reliable software.
  • Drive a shift from reactive to proactive. Identify what is breaking before it breaks. Use telemetry, anomaly detection, and trend analysis to surface systemic issues then partner with technical stakeholders to eliminate them at the source.
  • Serving as incident commander on P1/P2 incidents: leading communication, coordinating engineering stakeholders to get services back up and running again.
  • Continuously refining the observability stack — building and tuning dashboards, alerts, and synthetic monitoring that give real-time visibility into system and end-user experience health.
  • Serving as the escalation point for the Platform Support team, acting as Subject Matter Expert (SME) on the consumption and observability of our services; sharing knowledge to level-up expertise across the team.
  • Engaging with Engineering and SRE teams to improve supportability through internal tooling and observability; leveraging large-scale data sets to troubleshoot complex emerging problems.
  • Assisting with remediation, tooling, and communication for Customer Advisories; driving critical technical escalations with engineering teams through problem isolation, data gathering, and validation of resolution.
  • Personally leading post-incident root cause analysis; owning postmortems end-to-end, running blameless postmortems reviews, and driving systemic fixes that prevent recurrence.
  • Advocating for users and stakeholders by exposing friction and reliability concerns within the products.
  • Driving automation and self-service that eliminates repeat tickets and matures our knowledge management strategy; implement AI-powered operational improvements to improve quality and speed.
  • Engaging with iManage partners proactively to build agreements and shared understanding of reliability, levelling up knowledge of reliability principles across the iManage ecosystem.

Benefits

  • Flexible working policy
  • Unlimited access to LinkedIn Learning courses
  • Unlimited access to interactive Microsoft courses & training
  • Comprehensive Health/Vision/Dental/Life Insurance
  • 401k Retirement Savings Plan with a company match up to 4%
  • Enhanced leave for expecting parents (20 weeks 100% paid for primary leave, and 10 weeks 100% paid for secondary leave)
  • Flexible time off policy
  • Multiple company wellness days each year
  • Access to RethinkCare, a global behavioral health platform
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service