Lead Site Reliability Engineer

RBCJersey City, NJ
1dHybrid

About The Position

RBC Capital Markets LLC seeks a Lead Site Reliability Engineer in Jersey City, NJ to support and spearhead the development and implementation of Site Reliability Engineering Solutions (monitoring and alerting, machine learning anomaly detection, self-healing, and reliability testing) for all applications. Lead the production environment, build software systems to manage platform infrastructure and applications, and enhance our software solutions' reliability, quality, and time-to-market. Lead and assist in incident management and problem management for applications within scope. Ensure all systems and applications comply with the scope of work. Remote work permitted 2-3 days per week.

Requirements

  • Bachelor’s degree in computer science or related technical field, plus 5 years of experience working within a financial institution, focusing on site reliability engineering
  • 5 years of experience in SRE/DevOps, application support, software development lifecycle (SDLC) to include C/C++ and Java
  • 5 years of experience with a variety of environments to include Linux, Windows, MacOS., Databases (Oracle, PostGres, MongoDB, SQL Server, and NoSQL), Cloud, distributed and mainframe, business workflows, and Services/APIs
  • 5 years of experience with database querying, KDB+, Q, Cloud, Docker, Data, and experience automating simple tasks to reduce the toil and increase operating system efficiency
  • 5 years of experience using automation tools to include RPA, UIPath, Ansible, Chef, and puppet
  • 5 years of experience with SRE languages, monitoring tools, service management (Dynatrace, Moogsoft, PagerDuty, ServiceNow, Elastic, Logstash, Kibana, Blue Prism, Catch Point, Grafana, yaml, Global Service Desk)
  • 5 years of experience Job Scheduling tools (CAWA, Airflow, Messageway, Control-M), Messaging platforms Solace, MQ, Kafka), CICD (Jenkins, Fusion, Ant, Bladelogic), Vulnerability management (SAST, DAST, IT Risk remediation)

Responsibilities

  • support and spearhead the development and implementation of Site Reliability Engineering Solutions (monitoring and alerting, machine learning anomaly detection, self-healing, and reliability testing) for all applications
  • Lead the production environment
  • build software systems to manage platform infrastructure and applications
  • enhance our software solutions' reliability, quality, and time-to-market
  • Lead and assist in incident management and problem management for applications within scope
  • Ensure all systems and applications comply with the scope of work

Benefits

  • a 401(k) program with company-matching contributions
  • health, dental, vision, life and disability insurance
  • paid time-off plan
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service