About The Position

The Principal Software Engineer - Ad Tech & Distributed Systems is responsible for leading reliability, performance, and operational excellence of the FreeWheel platforms. This role focuses on designing, operating, and troubleshooting large‑scale distributed systems while owning monitoring, incident response, change management, and capacity planning. As a technical subject matter expert, the Principal Software Engineer leads and resolves complex issues, automates operational workflows, and partners with engineering, vendors, and client services to deliver scalable, high‑quality solutions. The role operates with limited supervision, applying sound judgment and independently developing solutions for non‑routine and complex challenges.

Requirements

  • 10+ years of professional experience in software development/engineering, with a proven track record of designing, building, and maintaining scalable applications.
  • 5+ years experience with AWS.
  • Expert‑level coding, debugging, and troubleshooting skills across complex, distributed production systems
  • Strong experience designing and operating server‑side applications or services using Python, Go-Lang, or Scala
  • Experience developing, operating, and troubleshooting distributed systems and backend services
  • Familiarity with data processing platforms, data pipelines, and large-scale system architectures
  • Deep knowledge of Linux systems, system internals, networking, and production infrastructure
  • Extensive experience with AWS cloud architecture and services including VPC, subnets, NACLs, security groups, EC2, S3, IAM, Route 53, Lambda, and related services
  • Proficiency with infrastructure‑as‑code and configuration management tools and practices
  • Mastery of CI/CD and SDLC tools (Docker, Kubernetes, Jenkins, Git, Ansible, Chef, and Puppet)
  • Strong understanding of database technologies, SQL, performance tuning, and operational data management
  • Advanced analytical and data‑driven problem‑solving skills, including use of metrics to guide decisions
  • Strong communication skills, attention to detail, adaptability, and ability to work effectively within a global, cross‑functional team

Nice To Haves

  • Proven ability to lead and mentor engineers in automation, reliability engineering, and production problem‑solving

Responsibilities

  • Own production reliability, availability, latency, and performance of large‑scale, mission‑critical systems
  • Design, implement, and operate monitoring, alerting, and observability solutions to ensure system health and rapid detection of issues
  • Lead incident response, root cause analysis, and post‑incident reviews to drive long‑term reliability improvements
  • Support and ensure stable operations during high‑visibility, time‑sensitive live events and releases
  • Drive automation initiatives to reduce operational toil, improve efficiency, and increase system resilience
  • Partner with software engineering teams to influence architecture and design decisions with production readiness in mind
  • Lead and execute change management, capacity planning, and production readiness reviews
  • Champion security, vulnerability management, and secure configuration practices across production environments
  • Enforce and continuously improve Engineering Operations processes, standards, and best practices
  • Participate in on‑call rotations, including weekend coverage, and provide escalation support for complex production issues

Benefits

  • Best-in-class Benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service