Manager, Engineering - Dev Ops/SRE (Hybrid)

CrowdStrike•Sunnyvale, CA

3d•$140,000 - $215,000•Hybrid

About The Position

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. We work on large scale distributed systems, processing almost 3 trillion events per day and this traffic is growing daily. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you. About the Role: At CrowdStrike, Site Reliability Engineering (SRE) is at the forefront of ensuring the reliability and scalability of our cloud-native security platform. In this role, you'll have the opportunity to manage a team of talented engineers, providing technical leadership on key projects and empowering them to excel in their roles. Our culture of intellectual curiosity and problem-solving is central to our success at CrowdStrike. We bring together individuals with varied backgrounds and perspectives, fostering collaboration and innovation in a blame-free environment. What You'll Do: As an SRE Engineering Manager, you will lead a team responsible for managing the complex challenges of scale unique to CrowdStrike, leveraging your expertise in software engineering, systems design, and automation. You will play a critical role in ensuring that our services maintain the highest levels of reliability, uptime, and performance, meeting the needs of our customers while continuously improving our systems. You'll have the opportunity to work on meaningful projects while providing support and mentorship to your team, enabling them to learn, grow, and make a lasting impact in the cybersecurity landscape. The ideal candidate will have hands-on experience in cloud solutions development, strong leadership skills, and a collaborative approach to working with cross-functional teams. Given our remote-first culture, exceptional verbal and written communication skills are essential for effective collaboration with engineering teams and colleagues worldwide. Prior experience in the security industry is not required for this role.

Requirements

Proven track record of building, growing, and retaining high-performing SRE/Platform engineering teams in a fast-paced, high-growth environment.
10+ years of software engineering experience with significant focus on reliability engineering, platform infrastructure, and production operations at scale.
3+ years of hands-on management experience overseeing SRE/Platform engineering teams, including incident command and reliability ownership.
Deep understanding of SRE principles including SLOs, SLAs, SLIs, error budgeting strategies applied to large-scale distributed systems.
Proficiency in at least one cloud environment (AWS, Azure, GCP) with emphasis on multi-region architecture, cloud-native reliability patterns, and security-first cloud design.
Strong incident management background.
Bachelor's degree in Computer Science or related field, or equivalent work experience.
Ability to work 2+ days per week in our Sunnyvale Offices

Nice To Haves

Experience operating security platforms, telemetry pipelines, or sensor fleet infrastructure at massive scale (millions of connected endpoints, petabyte-scale data processing).
Proficiency in Golang, CrowdStrike's primary backend language for platform services.
Experience with Kubernetes at scale managing large cluster fleets, including service mesh technologies (Istio/Linkerd) and container security practices.
Familiarity with high-throughput data streaming platforms such as Apache Kafka and Apache Flink for real-time event processing.
Experience with hybrid cloud environments spanning cloud and on-premise data center infrastructure including multi-cloud failover strategies.
Advanced observability experience including Prometheus, Grafana, distributed tracing (Jaeger/OpenTelemetry), and large-scale log aggregation (ELK/Splunk) with a focus on building custom SLO dashboards and reliability scorecards.
Experience building and running chaos engineering programs like fault injection testing, and resilience validation for security-critical systems.
Contributions to open-source projects or involvement in the tech community through conferences, meetups, or online forums.

Responsibilities

Lead a team responsible for managing the complex challenges of scale unique to CrowdStrike, leveraging expertise in software engineering, systems design, and automation.
Play a critical role in ensuring that services maintain the highest levels of reliability, uptime, and performance, meeting customer needs while continuously improving systems.
Work on meaningful projects while providing support and mentorship to the team, enabling them to learn, grow, and make a lasting impact in the cybersecurity landscape.
Manage a team of talented engineers, providing technical leadership on key projects and empowering them to excel in their roles.
Drive system reliability by blending software engineering principles with AI-driven automation, moving from reactive firefighting to proactive, automated operations.
Own reliability for high-throughput distributed systems processing millions of events per second, including capacity planning, traffic management, and load shedding strategies.
Lead major incident response, facilitate blameless postmortems, and drive systemic reliability improvements.
Build, operationalize, and maintain highly scalable, security-critical systems with zero tolerance for data loss or downtime.

Benefits

Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified™ across the globe
health insurance
401k
paid time off