Service Reliability Engineer

ProofpointTallahassee, FL
399d

About The Position

As a Service Reliability Engineer at Proofpoint, you will play a crucial role in ensuring the reliability and performance of our next-generation security products. This position involves managing and scaling production services across multiple data centers and AWS, contributing to architectural improvements, and collaborating with cross-functional teams to enhance service reliability and capacity.

Requirements

  • 3-5 years' experience managing, troubleshooting, and tuning Linux systems.
  • Experience with industry-standard foundation technologies such as TCP/IP, HTTP, DNS, SMTP, and LDAP.
  • Experience in management of a large distributed computing environment.
  • Experience with virtualization technologies such as KVM, VMware vSphere, and/or OpenStack.
  • Excellent verbal and written communication skills.
  • Experience with monitoring and alerting systems.
  • Experience with industry-standard operational practices such as change management and incident management.
  • Experience with configuration management tools such as Puppet or Chef.
  • Experience automating management of systems and applications using Perl, Python, or Ruby.
  • Experience with load-balancing technologies like F5, Netscaler or similar.
  • Experience with Kafka, Elastic Search, Cassandra, and MySQL.
  • Experience with public cloud providers such as Amazon EC2 or Microsoft Azure.
  • BS or equivalent experience in Computer Science, Engineering or related technical discipline.

Nice To Haves

  • Experience with automation tools and frameworks.
  • Familiarity with security best practices in cloud environments.

Responsibilities

  • Build long lasting, effective partnerships across the organization to foster collaboration between Product, Engineering and Operations teams.
  • Manage an international 24x7, multi-site production infrastructure powering the Proofpoint services, including deployment, maintenance, troubleshooting, performance tuning, and security.
  • Root-cause complex problems and involve multiple stakeholders, network, hardware and software that relate to scaling and performance.
  • Ensure proper monitoring, alerting, capacity planning and reporting in the production environment.
  • Contribute to the evolving design and architecture of reliable and scalable infrastructure.
  • Act as the first line of defense during working hours for any alerts or incidents that arise.
  • Collaborate with product engineering teams to ensure Operations standards are observed, determine resource impacts for upcoming product deployments, and ensure successful product rollouts.
  • Participate in an on-call rotation and be willing to jump on escalated issues as needed.

Benefits

  • Flexible time off
  • Robust well-being program with 4 global wellbeing days per year
  • 3-week work from anywhere option

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Administrative and Support Services

Education Level

Bachelor's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service