About The Position

As Point72 reimagines the future of investing, our Technology team is constantly evolving our firm’s IT infrastructure and engineering capabilities, positioning us at the forefront of a rapidly evolving technology landscape. We’re a team of experts who experiment and work to discover new ways to harness open-source solutions, modern cloud architectures, and sophisticated Artificial Intelligence (AI) solutions, while embracing enterprise agile methodologies. Our commitment to building and innovating in the AI space provides the framework intended to drive smarter decision making and enhance how we build and operate our platforms and applications.

Requirements

  • Bachelor’s degree in computer science, engineering or a related field, or equivalent work experience
  • 5+ years of experience in enterprise scheduling or workload automation tools, including ActiveBatch, CA Workload Automation, Control-M, and/or Autosys.
  • 2+ years of experience in a site reliability, DevOps, or production support role with exposure to SLA management and SLI/SLO frameworks
  • Hands-on expertise with ActiveBatch or similar workload automation tools including job scheduling, calendars, dependencies, security, and versioning
  • Familiarity with Apache Airflow concepts, including DAG-design, operators, executors, and deployment patterns
  • Strong scripting skills in PowerShell
  • Proven track record of troubleshooting complex, distributed workflows and performing root-cause analysis
  • Experience building and managing monitoring solutions using tools such as Splunk, Datadog, and/or Prometheus/Grafana
  • Ability to partner with application owners, business analysts, and infrastructure teams to drive continuous improvements
  • Excellent communication skills with the ability to translate technical concepts for non-technical stakeholders
  • Commitment to the highest ethical standards

Responsibilities

  • Serve as the Subject-Matter Expert (SME) for our enterprise scheduling platforms
  • Maintain, tune, and upgrade the scheduling environment to ensure stability and high availability
  • Develop and enhance automation solutions using PowerShell and other scripting languages to streamline workload orchestration
  • Build, configure, and refine monitoring dashboards, alerts, and reports to track system health, throughput, and performance
  • Lead incident response for high-priority scheduling failures including troubleshooting, resolving the issue, and performing a root-cause analysis
  • Define, establish, and report on SLIs, SLOs and SLAs for critical business workflows
  • Collaborate with cross-functional teams to onboard new workflows, optimize job dependencies, and implement best practices
  • Create and maintain comprehensive documentation, runbooks, and training materials for end users and support teams
  • Participate in a rotational on-call schedule to support 24/7 operations and critical incident management

Benefits

  • Fully-paid health care benefits
  • Generous parental and family leave policies
  • Mental and physical wellness programs
  • Volunteer opportunities
  • Non-profit matching gift program
  • Support for employee-led affinity groups representing women, minorities and the LGBTQ+ community
  • Tuition assistance
  • A 401(k) savings program with an employer match and more
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service