Service Reliability Engineer

Proofpoint-posted about 1 year ago

Full-time • Mid Level

Tampa, FL

501-1,000 employees

Administrative and Support Services

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

As a Service Reliability Engineer at Proofpoint, you will play a crucial role in ensuring the reliability and performance of our next-generation security products. This position involves managing and scaling production services across multiple data centers and AWS, contributing to the architecture for improved scalability and operability, and collaborating with cross-functional teams to enhance service reliability.

Build long lasting, effective partnerships across the organization to foster collaboration between Product, Engineering and Operations teams.
Manage an international 24x7, multi-site production infrastructure powering the Proofpoint services, including deployment, maintenance, troubleshooting, performance tuning, and security.
Root-cause complex problems and involve multiple stakeholders, network, hardware and software that relate to scaling and performance.
Ensure proper monitoring, alerting, capacity planning and reporting in the production environment.
Contribute to the evolving design and architecture of reliable and scalable infrastructure.
Act as the first line of defense during working hours for any alerts or incidents that arise.
Collaborate with product engineering teams to ensure Operations standards are observed, determine resource impacts for upcoming product deployments, and ensure successful product rollouts.
Participate in an on-call rotation and be willing to jump on escalated issues as needed.

3-5 years' experience managing, troubleshooting, and tuning Linux systems.
Experience with industry-standard foundation technologies such as TCP/IP, HTTP, DNS, SMTP, and LDAP.
Experience in management of a large distributed computing environment.
Experience with virtualization technologies such as KVM, VMware vSphere, and/or OpenStack.
Excellent verbal and written communication skills.
Experience with monitoring and alerting systems.
Experience with industry-standard operational practices such as change management and incident management.
Experience with configuration management tools such as Puppet or Chef.
Experience automating management of systems and applications using Perl, Python, or Ruby.
Experience with load-balancing technologies like F5, Netscaler or similar.
Experience with Kafka, Elastic Search, Cassandra, and MySQL.
Experience with public cloud providers such as Amazon EC2 or Microsoft Azure.
BS or equivalent experience in Computer Science, Engineering or related technical discipline.

Experience with automation tools and frameworks.
Familiarity with security best practices in cloud environments.

Flexible time off
Robust well-being program providing 4 global wellbeing days per year
3-week work from anywhere option
Competitive salary
Variable pay and/or equity options

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Site Reliability Engineer Resume Examples

•

Site Reliability Engineer Cover Letter Examples

Service Reliability Engineer

Job Search Resources

Tools

Career Hubs

Guides

Company