Senior Associate - Workload Automation Engineer

New York Life Insurance Co
Hybrid

About The Position

Serve as the engineering owner for New York Life’s enterprise workload automation ecosystem. You’ll operate and harden scheduling platforms and calendars, design resilient restart/rerun patterns, and standardize job definitions, logging, and audit evidence across environments. Your work will ensure critical batch chains run predictably, meet SLAs, and support a consistent, automation-first operating model. This role ensures that the workload automation application and services are reliable, scalable, and performant by applying software engineering practices to operations. This role also ensures automating infrastructure, defining and monitoring service-level objectives (SLOs), improving observability, and leading incident response to minimize downtime and enhance system resilience to support the enterprise scheduling requirements.

Requirements

  • 5–8+ years of experience in enterprise workload automation, SRE, or production operations supporting mission-critical batch processing.
  • Hands-on experience with Stonebranch or at least one major enterprise scheduler (e.g., ESP, Control-M, AutoSys, IBM Workload Scheduler/TWS, Redwood) including: Operating controllers/agents across environments.
  • Managing calendars/holiday tables and SLA jeopardy configurations.
  • Strong scripting and automation skills in PowerShell, Bash, or Python, SQL plus familiarity with YAML/JSON and REST APIs.
  • Experience with Git-based workflows and CI/CD pipelines for job-as-code and configuration promotion.
  • Proven design and implementation of restart/rerun patterns, dependency modeling, and idempotent batch frameworks.
  • Excellent coordination skills across incident and change processes, with clear, concise communication to technical and non-technical stakeholders.
  • Strong AWS foundation: Core services: EC2, S3, RDS/DynamoDB, VPC, IAM, Lambda
  • Networking: subnets, routing, load balancers (ALB/NLB), security groups
  • High availability & scaling: Auto Scaling, multi-AZ/region patterns
  • Observability: CloudWatch, X-Ray, logging pipelines
  • Cost awareness and optimization basics
  • Infrastructure as Code & automation: Terraform, CloudFormation, or CDK
  • Config management (Ansible, etc.)
  • Building repeatable, scalable infrastructure
  • Security & “hardening” basics: IAM best practices (least privilege)
  • Secrets management (AWS Secrets Manager, Parameter Store)
  • Patch management, vulnerability scanning
  • Network isolation and encryption

Nice To Haves

  • Experience in financial services or other highly regulated industries.
  • Background standardizing multiple schedulers and creating common audit schemas and evidence-capture patterns.
  • Relevant certifications such as ITIL, cloud architect/operations, DR/BC (e.g., DRII/BCI), or security (e.g., CISSP).

Responsibilities

  • Support various platforms, including Windows, Linux, macOS, and cloud environments (e.g., AWS, Azure).
  • Operate and maintain scheduling controllers and agents across multiple platform environments.
  • Manage calendars and holiday tables; configure SLA jeopardy thresholds, alerting, and escalation paths.
  • Implement platform upgrades, patches, and configuration changes in line with standards and change governance.
  • Design restart/rerun patterns (checkpointing, idempotent wrappers) and failure-handling flows for critical batches.
  • Model dependencies and schedules as code (job-as-code) in version control with CI/CD-based promotion.
  • Reduce single points of failure and improve consistency across job chains and environments.
  • Investigate and resolve application performance bottlenecks by analyzing code, queries, APIs, and data flows.
  • Focus on key reliability and performance indicators: uptime, system throughput, system output, and download rate/application load speed.
  • Define and maintain standard naming conventions, templates, parameters, and calendars across schedulers.
  • Engineer common audit-evidence and log schemas to support internal and external reviews.
  • Ensure data retention, traceability, and segregation of duties align with policies and regulatory requirements.
  • Design and implement automation solutions using Java, JavaScript, APIs, SQL, and Terraform.
  • Implement pre/post checks, synthetic probes, and health validations for batch workflows.
  • Define and maintain SLIs/SLOs for batch completion, success rates, and recovery times.
  • Build safeguards that detect anomalies and misconfigurations before they impact downstream processes.
  • Provide expert support and troubleshooting across network and enterprise service issues, ensuring minimal disruption to business operations.
  • Integrate schedulers with observability tools (logs, metrics, dashboards) to improve visibility.
  • Tune job concurrency, execution windows, and resource usage for performance and cost efficiency.
  • Reduce noisy alerts and improve the signal-to-noise ratio for incident responders.
  • Identify opportunities to improve support processes and implement best practices to enhance overall efficiency.
  • Build monitoring, observability dashboards, and alerting systems, Monitor network and platform performance, identifying and addressing potential issues proactively ensure to address gaps identified during troubleshooting efforts.
  • Align scheduler changes, maintenance, and releases with APSO/Change Management processes.
  • Lead incident triage and resolution for batch failures, including rapid root-cause analysis and safe restarts/reruns.
  • Contribute to post-incident reviews and drive remediation actions into platform and pattern improvements.
  • Collaborate with Application Owners/Developers, DBAs/Data teams, SRE/Observability, Security, and Vendors to keep batch chains healthy and compliant.
  • Provide guidance on best practices for job design, scheduling windows, dependencies, and error handling.
  • Document patterns, playbooks, and standards; mentor peers and junior engineers in workload automation.

Benefits

  • leave programs
  • adoption assistance
  • student loan repayment programs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service