Site Reliability Engineering Lead

LineVision, Inc.Boston, MA
12hHybrid

About The Position

Hybrid: Boston, MA Headquarters (1-2 days/week in office) Lead the establishment of LineVision's SRE practice and shape how we deliver grid-grade reliability. We are seeking a Site Reliability Engineering Lead to build our first dedicated SRE function from the ground up—defining the standards, practices, and frameworks that ensure our grid intelligence platform meets the exceptional reliability requirements of utility customers operating mission-critical infrastructure. This is a high-impact, individual contributor role where you'll be both hands-on implementing reliability infrastructure and strategic in driving organizational adoption of SRE practices. If you are looking to combine deep technical expertise with cross-functional influence to establish reliability practices that directly impact grid operations, join us at LineVision, Built In Boston Best Places to Work!

Requirements

  • SRE Practice Building: Demonstrated experience establishing SRE practices including defining SLOs, implementing error budgets, and driving organizational adoption—not just maintaining existing practices
  • Cross-Functional Planning & Influence: Proven ability to plan and sequence complex initiatives across multiple teams, influence without authority, and drive technical standards adoption
  • AWS Expertise: Deep hands-on experience with production AWS services including EC2, RDS, Lambda, VPC configuration, and networking
  • Observability & Monitoring: Expert proficiency with tools like Datadog, Prometheus, Grafana, or CloudWatch for instrumenting distributed systems
  • Infrastructure as Code: Strong experience with Terraform, CloudFormation, or Pulumi
  • Programming & Automation: Python and TypeScript experience for instrumentation, automation, and tooling

Nice To Haves

  • Experience establishing SRE practices at high-scale technology companies and translating them to different organizational contexts
  • Background in energy, utility, or critical infrastructure sectors where reliability directly impacts operations
  • Track record driving technical standards adoption across engineering organizations without direct authority
  • Strategic thinking about balancing quick wins with long-term infrastructure investments
  • Can operate at both tactical (hands-on implementation) and strategic (organizational influence) levels

Responsibilities

  • Establish LineVision's SRE practice from the ground up - define Service Level Objectives, implement observability frameworks, and build deployment safety guardrails while driving organizational adoption of SRE methodologies
  • Be hands-on with reliability infrastructure - instrument services, configure monitoring tools, build dashboards, create alerting frameworks, and establish incident response procedures
  • Plan and influence across teams - partner strategically with engineering, platform, product, and customer support to sequence SRE initiatives, balance competing priorities, and drive adoption of reliability standards without direct authority
  • Communicate reliability as business value - translate technical metrics, error budgets, and system health into business impact for both technical teams and executive stakeholders

Benefits

  • Flexibility. You will be empowered to maintain work-life balance with trust-based PTO and a flexible work schedule.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service