Production Operations / Reliability Engineer

Blueprint TechnologiesRedmond, WA
$34 - $36Onsite

About The Position

In this role, you will support the reliability, stability, and live operations of a new device and software platform during internal testing and self-host programs. You will focus on monitoring system health through telemetry, investigating live issues, supporting software releases, and validating prototype devices in production-like environments. This is a hands-on, engineering-oriented operations role where you will work closely with software engineers, QA, infrastructure, and product partners to ensure operational readiness and service stability. You will independently manage day-to-day monitoring, triage incidents, support release validation, and provide clear, actionable insights to improve system reliability and product readiness.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • 5–7+ years of experience in software engineering, DevOps, SRE, production engineering, service operations, or infrastructure roles.
  • Experience monitoring live systems using telemetry, logs, metrics, dashboards, and alerting tools.
  • Strong troubleshooting skills across software, services, and device environments.
  • Experience supporting software releases, deployments, and post-release validation.
  • Ability to independently investigate technical issues and clearly communicate findings.
  • Experience documenting incidents, operational procedures, and troubleshooting steps.
  • Familiarity with CI/CD pipelines, release workflows, and incident response practices.
  • Strong problem-solving skills and ability to work independently in a fast-paced environment.

Nice To Haves

  • Experience working with mobile operating systems (Android strongly preferred).
  • Experience supporting prototype devices, hardware/software integration, or lab-based environments.
  • Familiarity with hybrid, cloud, or on-prem infrastructure monitoring.
  • Hands-on experience gathering device logs and performing in-person troubleshooting.
  • Experience improving operational processes, alerting quality, or monitoring coverage.
  • Background supporting internal self-host programs or early-stage product releases.

Responsibilities

  • Monitor telemetry, dashboards, logs, alerts, and metrics to assess the health of services, applications, and prototype devices.
  • Identify anomalies, failures, and performance degradation across software and device environments.
  • Analyze real-time and historical data to diagnose issues and surface reliability risks.
  • Triage operational issues and communicate findings clearly to engineering and product teams.
  • Recommend improvements to monitoring coverage, alert quality, and operational visibility.
  • Support software releases by validating deployments and monitoring post-release system stability.
  • Track service and device health during rollouts, updates, and release validation periods.
  • Investigate and assist in resolving live issues impacting internal users or device readiness.
  • Partner with engineering teams on mitigations, fixes, rollbacks, and follow-up validation.
  • Document release observations, risks, and stability assessments.
  • Support incident response by gathering logs, diagnostics, and impact data.
  • Summarize incidents, suspected root causes, and mitigation progress.
  • Participate in post-incident reviews and document lessons learned.
  • Maintain records of incidents, recurring issues, and known reliability risks.
  • Identify opportunities to reduce operational toil through documentation or process improvements.
  • Perform in-person troubleshooting for prototype devices and self-hosted systems when needed.
  • Assist with device configuration, deployment, validation, and health checks.
  • Run smoke tests and readiness checks to confirm system and device stability.
  • Document hardware configurations, operational procedures, and environment setup.
  • Work cross-functionally with software engineering, QA, infrastructure, and product teams.
  • Communicate system health, risks, and technical findings clearly and concisely.
  • Provide regular status updates, health summaries, and operational reports.
  • Operate independently while escalating issues appropriately when deeper engineering support is required.

Benefits

  • Medical, dental, and vision coverage
  • Flexible Spending Account
  • 401k program
  • Competitive PTO offerings
  • Parental Leave
  • Opportunities for professional growth and development
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service