Staff Customer Engineer

Harness
$148,000 - $182,000Remote

About The Position

We are looking for a Staff Software Engineer [Customer Facing] for our customer engineering team. This is a high-impact hybrid role that combines hardcore DevOps/SRE problem-solving with hands-on internal tooling and direct customer consulting. You will troubleshoot complex, often ambiguous issues for enterprise customers across cloud and container environments, tackling broken pipelines, deployment failures, connectivity problems, and misconfigured infrastructure. You will own these issues end-to-end, acting as a trusted technical advisor to customer engineering teams, and feeding crucial findings back to our core Product and Engineering organizations. You will also build solutions by engineering internal tools, diagnostic utilities, and playbooks. Crucially, you will help us build the future of our operations by developing internal AI tools on top of LLMs, training models on our own proprietary data to automate diagnostics and streamline workflows. This is a highly customer-facing role for individuals with an SRE or platform background, possessing the coding chops to build robust tooling, and who thrive when collaborating directly with customers on complex technical challenges.

Requirements

  • 6+ years in a customer-facing engineering, DevOps, SRE, or platform engineering role — where troubleshooting was a core part of the work, not an occasional side task.
  • Solid hands-on experience with Kubernetes (k8s), ECS, and Docker — you can work through a broken pod, misconfigured ingress, or failed deployment without being guided step by step.
  • Experience across at least two major cloud platforms (AWS, GCP, Azure), including debugging infrastructure issues, IAM/permissions, and networking problems.
  • Solid understanding of CI/CD concepts and tooling — Harness, Jenkins, GitHub Actions, CircleCI, or equivalent. You need to understand how pipelines break and why; Harness-specific knowledge can be learned on the job.
  • Comfortable owning a customer call without escalating every decision — able to drive a troubleshooting session, communicate progress clearly, and set honest expectations when a fix takes time.
  • Able to write up technical findings clearly — for an engineer who needs the detail and for a manager who needs the summary.
  • Proficiency with Linux systems, networking fundamentals (DNS, TLS, load balancing), and distributed system debugging.
  • Scripting ability in at least one language (Python, Node.js, Bash, or similar) — enough to automate a diagnostic, build a small utility, or clean up a repetitive task.
  • Comfortable reading source code to understand how a product behaves, identify where something may be breaking, and form a hypothesis without waiting for Engineering to explain it.
  • Hands-on experience with observability tooling — Datadog, Splunk, Prometheus, or similar — for diagnosing performance issues and tracing failures across distributed systems.

Nice To Haves

  • Infrastructure-as-Code experience — Terraform, Pulumi, CloudFormation, or similar.
  • AI & LLM Experience: Experience building internal tools on top of LLMs, fine-tuning models on custom datasets, or familiarity with RAG architectures.
  • Emerging AI Ecosystems: Knowledge of AI agents and Model Context Protocol (MCP) servers.
  • Experience with secrets management tools (HashiCorp Vault, AWS Secrets Manager, etc.).
  • Comfortable using AI tools (e.g. GitHub Copilot, ChatGPT, or similar) to accelerate troubleshooting, write scripts, or build internal utilities.

Responsibilities

  • Serve as the primary technical resource for enterprise customers during complex troubleshooting, onboarding, and expansion.
  • Be comfortable reading source code to understand how a product behaves, identify where something may be breaking, and form a hypothesis without waiting for core Engineering to explain it.
  • Own complex customer issues across Kubernetes (k8s), ECS, Docker, cloud platforms (AWS, GCP, Azure), and on-premise/hybrid environments — from first contact through to resolution.
  • Perform root-cause analysis on pipeline failures, deployment issues, runner/agent connectivity, secrets management errors, and service-to-service communication.
  • Debug infrastructure automation, execution logs, and metrics data across CloudWatch, Google Cloud Operations/Stackdriver, and Azure Monitor.
  • Lead incident triage during escalations, coordinate cross-functionally, and deliver clear technical findings to Engineering.
  • Reproduce edge-case bugs with clean reproduction steps and drive resolution in partnership with Product and Engineering.
  • Develop and maintain runbooks, troubleshooting guides, and customer-facing playbooks.
  • Lead live troubleshooting sessions, screenshares, and technical calls — able to communicate clearly with hands-on engineers and with engineering managers who need the short version.
  • Set clear expectations when issues are complex or slow-moving.
  • Guide customers through best-practice CI/CD configurations and deployment strategies suited to their environment.
  • When patterns emerge across customer issues, build scripts, utilities, or automation to address them — reducing manual effort for yourself and the team.
  • Design and engineer internal applications, automation scripts, and diagnostic utilities to eliminate manual effort across the Customer Engineering team.
  • Contribute to the development of internal AI tools powered by Large Language Models (LLMs), including training models on Harness's proprietary data to accelerate incident resolution.
  • Explore and integrate emerging AI frameworks and concepts (like Model Context Protocol / MCP servers) to enhance our internal tooling ecosystem.

Benefits

  • Competitive salary
  • Comprehensive healthcare benefits
  • Flexible Spending Account (FSA)
  • Flexible work schedule
  • Employee Assistance Program (EAP)
  • Flexible Time Off and Parental Leave
  • Monthly, quarterly, and annual social and team building events
  • Monthly internet reimbursement
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service