Sr. Software Engineer

Blackhawk NetworkDallas, TX
Hybrid

About The Position

We are looking for a Senior or Staff Software Engineer with deep Site Reliability Engineering expertise and strong Python development skills to join our SRE team. You will own the reliability, scalability, and observability of a critical Data pipeline platform. This is a builder and operator role. you will write production-grade Python services, and implement observability-as-code infrastructure, and drive reliability improvements across a complex AWS-native platform. You will work at the intersection of software engineering and infrastructure, eliminating toil through automation, building internal developer platforms, and ensuring our systems meet the reliability bar our partners and customers expect.

Requirements

  • Bachelor's degree in Information Technology, Computer Science, or related field; or equivalent experience.
  • 7+ years of software engineering experience
  • 4+ years of SRE or platform engineering experience in a production environment
  • Strong Python proficiency production services, automation, REST APIs
  • Experience with AWS EKS, EC2, RDS/Aurora, S3, IAM, VPC, CloudWatch, Security Groups
  • Hands-on Terraform experience writing modules, managing state, CI/CD integration
  • Experience with observability platforms Splunk New Relic (NRQL, NerdGraph)
  • Strong understanding of SRE principles SLOs/SLIs/error budgets, toil reduction, incident management, capacity planning
  • Experience with Kubernetes / EKS pod operations
  • Strong Linux/ Unix fundamentals log management, performance debugging

Nice To Haves

  • Experience with log pipeline tooling FluentD, Open Telemetry
  • Familiarity with Splunk, New Relic, OpenSearch, Elasticsearch for log storage and search
  • Experience building internal developer platforms or tooling CLI tools, internal APIs, automation frameworks
  • Familiarity with MCP (Model Context Protocol) server development or AI Gateway architecture

Responsibilities

  • Own and improve service reliability, availability, and performance across a distributed AWS platform (EKS, EC2, RDS, S3)
  • Define and track SLOs, SLIs, and error budgets for critical services and partner-facing integrations
  • Drive initiatives identify manual, repetitive operational work and eliminate it through automation
  • Partner with engineering teams to embed reliability practices into the SDLC
  • Build and maintain observability-as-code infrastructure using Terraform and New Relic NerdGraph
  • Design multi-signal observability pipelines metrics, logs, traces across Splunk, New Relic, and OpenSearch
  • Architect and maintain log routing and sawmills data pipelines using FluentD and Sawmills
  • Build New Relic NerdGraph integrations and custom NRQL dashboards for platform health visibility
  • Maintain and extend Terraform modules for AWS infrastructure provisioning Security Groups, EKS node groups, Aurora, IAM roles
  • Contribute to the MCP server architecture (Python-based internal developer tooling) integrating Jira, Confluence, Bitbucket, and observability platforms
  • Design, build, and maintain production-grade Python services and automation REST APIs, background workers, CLI tooling, and data pipeline adapters
  • Mentor junior and mid-level engineers code reviews, pair programming, architecture guidance
  • Work closely with program managers and product stakeholders

Benefits

  • 401k with employer match
  • medical
  • dental
  • vision
  • 12 paid holidays throughout the year 2025
  • sick pay accrual according to state law
  • parental leave
  • life insurance
  • disability insurance
  • accident and illness insurance
  • health and dependent care flexible spending accounts
  • wellness benefits
  • flexible time off for all full-time employees
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service