Worldpay-posted 2 months ago
$108,200 - $181,800/Yr
5,001-10,000 employees

This position is not eligible for sponsorship, now or in the future. Candidates must be a US Citizen or Green Card holder. Are you ready to write your next chapter? Make your mark at one of the biggest names in payments. With proven technology, we process the largest volume of payments in the world, driving the global economy every day. When you join Worldpay, you join a global community of experts and changemakers, working to reinvent an industry by constantly evolving how we work and making the way millions of people pay easier, every day. Worldpay powers 2.2 trillion payments annually across 146 countries in over 135 separate currencies with over a million merchants supported globally. Worldpay is the largest acquirer by volume globally, we provide a reliable, secure, and scalable payments platform 24x7 365 days a year. Being part of the 200 strong Infrastructure Services organization, you’ll help to engineer and deliver the core infrastructure services that power our payments platform. We’re responsible for running some very critical systems, maintaining 20,000 servers via an automation platform, thousands of databases and petabytes of storage hosted from our data centers and public cloud. We are looking for talented individuals to join the Infrastructure Services organization; you’ll be a self-starter, possess an analytical mindset and be a change agent.

  • Implement and manage observability tools such as Splunk, Zabbix, Dynatrace, Datadog, and similar platforms for infrastructure, applications, and cloud services.
  • Set up and configure dashboards, alerts, and reports that provide visibility into system health, performance, and availability.
  • Develop and maintain centralized logging solutions to ensure comprehensive logging coverage, log retention, and log security.
  • Work with IT, DevOps, and product teams to define key performance indicators (KPIs) and service-level objectives (SLOs) for critical systems and applications.
  • Provide support in monitoring and troubleshooting production systems, using observability tools to identify performance bottlenecks, anomalies, and incidents.
  • Assist in automating monitoring tasks and creating self-healing scripts to enhance system reliability.
  • Analyze logs and telemetry data to provide insights for incident detection, root cause analysis, and performance optimization.
  • Participate in on-call rotations, responding to incidents and using observability tools for rapid diagnosis and resolution.
  • Collaborate with security teams to ensure log management solutions support security monitoring and incident investigation.
  • Continuously evaluate and recommend improvements to observability and log management practices, tools, and processes.
  • At least 5 years of experience in IT Operations, with a focus on monitoring, observability, and log management.
  • Solid understanding of Open Telemetry (OTEL) based monitoring and observability concepts, including metrics, logs, traces, and alerts.
  • Hands-on experience with observability and monitoring tools (e.g., Splunk Observability, Zabbix, Datadog, Dynatrace, Prometheus, Grafana, New Relic).
  • Strong understanding of log management best practices, including centralized logging, data retention, and privacy requirements.
  • Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and managing cloud-based monitoring solutions.
  • Experience in designing and implementing system health dashboards, alerting mechanisms, and automated incident response processes.
  • Strong problem-solving skills and the ability to work under pressure in a fast-paced environment.
  • Basic scripting skills (e.g., Python, Bash) for task automation and custom monitoring solutions.
  • Excellent communication and collaboration skills, with the ability to work with cross-functional teams.
  • A bachelor's degree or greater in computer science, information technology, or a related field.
  • Practical experience in the role can be used in place of formal education.
  • Knowledge of ITIL or similar frameworks for incident and problem management.
  • Exposure to DevOps principles and experience with CI/CD pipelines.
  • Experience in container monitoring (e.g., Kubernetes, Docker) and cloud-native architectures.
  • Technical certifications in cloud and virtualization technologies are highly valued.
  • Any certifications for AWS, Azure, MSCE, RH or VMware Certified Professional (VCP), VMware Certified Advanced Professional (VCAP), and Citrix Certified Associate - Virtualization (CCA-V), Datadog, Dynatrace, Splunk or other observability tools.
  • A competitive salary and benefits.
  • Time to support charities and give back to your community.
  • Parental leave policy.
  • Global recognition platform.
  • Virgin Pulse access.
  • Global employee assistance program.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service