Proofpoint-posted about 1 year ago
Full-time • Mid Level
Tampa, FL
501-1,000 employees
Administrative and Support Services

As a Service Reliability Engineer at Proofpoint, you will play a crucial role in ensuring the reliability and performance of our next-generation security products. This position involves managing and scaling production services across multiple data centers and AWS, contributing to the architecture for improved scalability and operability, and collaborating with cross-functional teams to enhance service reliability.

  • Build long lasting, effective partnerships across the organization to foster collaboration between Product, Engineering and Operations teams.
  • Manage an international 24x7, multi-site production infrastructure powering the Proofpoint services, including deployment, maintenance, troubleshooting, performance tuning, and security.
  • Root-cause complex problems and involve multiple stakeholders, network, hardware and software that relate to scaling and performance.
  • Ensure proper monitoring, alerting, capacity planning and reporting in the production environment.
  • Contribute to the evolving design and architecture of reliable and scalable infrastructure.
  • Act as the first line of defense during working hours for any alerts or incidents that arise.
  • Collaborate with product engineering teams to ensure Operations standards are observed, determine resource impacts for upcoming product deployments, and ensure successful product rollouts.
  • Participate in an on-call rotation and be willing to jump on escalated issues as needed.
  • 3-5 years' experience managing, troubleshooting, and tuning Linux systems.
  • Experience with industry-standard foundation technologies such as TCP/IP, HTTP, DNS, SMTP, and LDAP.
  • Experience in management of a large distributed computing environment.
  • Experience with virtualization technologies such as KVM, VMware vSphere, and/or OpenStack.
  • Excellent verbal and written communication skills.
  • Experience with monitoring and alerting systems.
  • Experience with industry-standard operational practices such as change management and incident management.
  • Experience with configuration management tools such as Puppet or Chef.
  • Experience automating management of systems and applications using Perl, Python, or Ruby.
  • Experience with load-balancing technologies like F5, Netscaler or similar.
  • Experience with Kafka, Elastic Search, Cassandra, and MySQL.
  • Experience with public cloud providers such as Amazon EC2 or Microsoft Azure.
  • BS or equivalent experience in Computer Science, Engineering or related technical discipline.
  • Experience with automation tools and frameworks.
  • Familiarity with security best practices in cloud environments.
  • Flexible time off
  • Robust well-being program providing 4 global wellbeing days per year
  • 3-week work from anywhere option
  • Competitive salary
  • Variable pay and/or equity options
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service