Principal Software Engineer - Ad Tech & Distributed Systems - FreeWheel

Comcast•Chicago, IL

49d•$152,829 - $229,243

About The Position

The Principal Software Engineer - Ad Tech & Distributed Systems is responsible for leading reliability, performance, and operational excellence of the FreeWheel platforms. This role focuses on designing, operating, and troubleshooting large‑scale distributed systems while owning monitoring, incident response, change management, and capacity planning. As a technical subject matter expert, the Principal Software Engineer leads and resolves complex issues, automates operational workflows, and partners with engineering, vendors, and client services to deliver scalable, high‑quality solutions. The role operates with limited supervision, applying sound judgment and independently developing solutions for non‑routine and complex challenges.

Requirements

10+ years of professional experience in software development/engineering, with a proven track record of designing, building, and maintaining scalable applications.
5+ years experience with AWS.
Expert‑level coding, debugging, and troubleshooting skills across complex, distributed production systems
Strong experience designing and operating server‑side applications or services using Python, Go-Lang, or Scala
Experience developing, operating, and troubleshooting distributed systems and backend services
Familiarity with data processing platforms, data pipelines, and large-scale system architectures
Deep knowledge of Linux systems, system internals, networking, and production infrastructure
Extensive experience with AWS cloud architecture and services including VPC, subnets, NACLs, security groups, EC2, S3, IAM, Route 53, Lambda, and related services
Proficiency with infrastructure‑as‑code and configuration management tools and practices
Mastery of CI/CD and SDLC tools (Docker, Kubernetes, Jenkins, Git, Ansible, Chef, and Puppet)
Strong understanding of database technologies, SQL, performance tuning, and operational data management
Advanced analytical and data‑driven problem‑solving skills, including use of metrics to guide decisions
Strong communication skills, attention to detail, adaptability, and ability to work effectively within a global, cross‑functional team

Nice To Haves

Proven ability to lead and mentor engineers in automation, reliability engineering, and production problem‑solving

Responsibilities

Own production reliability, availability, latency, and performance of large‑scale, mission‑critical systems
Design, implement, and operate monitoring, alerting, and observability solutions to ensure system health and rapid detection of issues
Lead incident response, root cause analysis, and post‑incident reviews to drive long‑term reliability improvements
Support and ensure stable operations during high‑visibility, time‑sensitive live events and releases
Drive automation initiatives to reduce operational toil, improve efficiency, and increase system resilience
Partner with software engineering teams to influence architecture and design decisions with production readiness in mind
Lead and execute change management, capacity planning, and production readiness reviews
Champion security, vulnerability management, and secure configuration practices across production environments
Enforce and continuously improve Engineering Operations processes, standards, and best practices
Participate in on‑call rotations, including weekend coverage, and provide escalation support for complex production issues