Senior Software Engineer, Platform Infrastructure

Roku•San Jose, CA

19d•$280,000 - $380,000•Hybrid

About The Position

Teamwork makes the stream work. Roku is changing how the world watches TV Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we've set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers. From your first day at Roku, you'll make a valuable - and valued - contribution. We're a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines. About the Team Our DevOps/SRE team runs an active-active, multi-cloud platform on AWS and GCP to keep business‑critical systems highly available, secure, and fast at internet scale. We focus on reliability and automation, engineering systems that perform under stress and continuously improve. We are engineers who own outcomes end‑to-end—managing priorities, communicating clearly with technical and non‑technical stakeholders, and delivering impact across the organization. If you thrive on architecting at scale, automating everything you can, and turning complex infrastructure into reliable, well‑documented systems, you’ll feel right at home here. About the Role We are seeking a talented and experienced DevOps/SRE (Site Reliability Engineering) Senior Software Engineer to join our dynamic team. The ideal candidate will have a strong background in DevOps practices, cloud infrastructure management, automation, and team leadership skills. If you have a consistent track record of architecting and building large-scale systems; enjoy solving intriguing system challenges at internet-scale; if you are innovative at heart; and have a great balance of skills in learning, organizing, building, and enjoy making an impact, this role might be a great fit for you! "For California Only - The estimated annual salary for this position is between $280,000 - $380,000 annually. Compensation packages are based on factors unique to each candidate, including but not limited to skill set, certifications, and specific geographical location. This role is eligible for health insurance, equity awards, life insurance, disability benefits, parental leave, wellness benefits, and paid time off."

Requirements

8+ years of experience in DevOps/SRE roles
BS Degree in Computer Science or Equivalent
Experience with a number of the following: Kubernetes, Docker, Service Mesh such as Istio, Envoy, Linkerd, Solo & ECS
Experience in cloud-focused software development, preferably in Go, Python, or other object-oriented programming languages
Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation
Experience with CI/CD automation including Gitlab pipelines and other related tools
Solid understanding of networking, security, and compliance principles including the intricacies of multi-tenant architecture and secure network configuration in cloud environments
Strong understanding of distributed systems, microservices architecture, and cloud-native technologies
Strong hands-on experience with cloud platforms such as AWS, GCP or Azure
Proven track record of implementing scalable, high-performance infrastructure solutions in a fast-paced dynamic environments
Demonstrated ability to communicate clearly with both technical and non-technical project stakeholders, with the ability to work effectively in a cross-functional team environment
The drive and self-motivation to understand the intricate details of a complex infrastructure environment

Nice To Haves

Certifications in relevant technologies such as Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer, or Certified Information Systems Security Professional (CISSP) are preferred

Responsibilities

You will design, implement and maintain an active-active multi-cloud infrastructure on AWS and GCP supporting business-critical systems ensuring high availability and performance with automation, delivering systems that stays reliable and performs under stress
Collaborate with your peers through code-reviews, ensuring best practices and aligning on technical standards to deliver consistent, high-quality solutions
Collaborate with security teams to ensure the integrity and security of infrastructure and applications including implementing security best practices and compliance standards
Manage individual project priorities, deadlines, and deliverables by leveraging agile methodologies and maintaining clear communication channels
Lead incident response efforts by triaging issues effectively, collaborating closely with cross-functional teams to resolve them promptly and minimize downtime
Implement effective incident management processes and post-incident reviews
Identify performance bottlenecks through detailed monitoring and profiling and optimize system resources by fine tuning configurations, scaling infrastructure and addressing latency issues
Drive continuous improvement initiatives by automating repetitive tasks, refining workflows and proactively addressing technical debt within the team, while driving enhancements across the organization
Maintain comprehensive documentation of systems, processes, and procedures while fostering a culture of knowledge sharing and contribute to the collective learning of the team
Participate in 24x7 on-call rotation, and be available to work with global teams in the event of critical outages