Site Reliability Engineer [Multiple Positions Available]

JPMorgan Chase & Co.Plano, TX
4hOnsite

About The Position

Duties: Troubleshoot, performance tune, and engineer both in-house and vendor products related to identity and access management and privileged access management. Run, maintain, and improve technology solutions in line with established service level objectives by applying software engineering principles. Develop tools and visualizations to gain insights into customer experience and product interactions. Collaborate with the development team throughout the software life cycle to build reliable systems and deploy them across different regions. Develop solutions to automate manual development and operational tasks. Ensure the availability, performance, change management, telemetry, and capacity management of technology solutions. Perform root cause analysis and participate in post- mortems to identify and address gaps, enhancing security solutions. Analyze usage and telemetry data to identify patterns, predict, and prevent failures in technology applications and services. Evaluate and test products before and after changes. Divide and allocate manual operational and engineering work. Provide support coverage and troubleshoot time-critical issues. QUALIFICATIONS: Minimum education and experience required: Bachelor's degree in Information Technology, Computer Science, or related field of study plus 7 years of experience in the job offered or as Site Reliability Engineer, Software Engineer, IT Project Manager, or related occupation. Skills Required: This position requires five (5) years of experience with the following: Gathering non-functional test requirements, including details about the application under test, technology stack, hosting infrastructure, system user load, service level agreements (SLAs), and user workload models; Designing load test scripts using at least one of the following programming languages: C, C++, Java, JavaScript, or Python; Designing load test scripts using at least one of the following testing tools: LoadRunner, Microsoft VSTS, Silk Performer, NeoLoad, IBM RPT, or JMeter; Designing user workload models to execute load, stress and endurance tests; Identifying performance bottlenecks using at least one of the following profiling tools: Visual VM or JProfiler; Measuring and monitoring metrics using AppDynamics and Dynatrace; Creating dashboards to enable telemetry and alerts using at least one of the following tools: Splunk, Grafana, AppDynamics or Dynatrace; Building CI/CD pipelines to automate end-to-end load test execution and load testing infrastructure maintenance; Building CI/CD Groovy pipelines to automate load test activities; Implementing GIT and SVN functionalities; Automating extraction of test data; Automating load test result generation and reporting; Analyzing performance test results; Conducting root cause analysis for complex business processes and functionality; Performing performance tuning to optimize garbage collection, heap management, and JVM configuration; Scripting using Python and PowerShell. This position requires three (3) years of experience with the following: Applying site reliability engineering principles and practices across multiple applications; Creating test automation scripts using Selenium; Writing Python utilities to manage services and MQ testing; Performing DB testing and tuning. This position requires two (2) years of experience with the following: Translating quantitative information into actionable insights based on Service Level Indicators (SLIs) and Service Level Objectives (SLOs); Managing patches, upgrades, and maintenance in Linux and Windows infrastructure. This position requires one (1) year of experience with the following: Conducting chaos testing to assess application resiliency and identify potential issues using Gremlin; Defining SLOs and SLIs for applications.

Requirements

  • Bachelor's degree in Information Technology, Computer Science, or related field of study plus 7 years of experience in the job offered or as Site Reliability Engineer, Software Engineer, IT Project Manager, or related occupation.
  • Five (5) years of experience with the following: Gathering non-functional test requirements, including details about the application under test, technology stack, hosting infrastructure, system user load, service level agreements (SLAs), and user workload models
  • Five (5) years of experience with the following: Designing load test scripts using at least one of the following programming languages: C, C++, Java, JavaScript, or Python
  • Five (5) years of experience with the following: Designing load test scripts using at least one of the following testing tools: LoadRunner, Microsoft VSTS, Silk Performer, NeoLoad, IBM RPT, or JMeter
  • Five (5) years of experience with the following: Designing user workload models to execute load, stress and endurance tests
  • Five (5) years of experience with the following: Identifying performance bottlenecks using at least one of the following profiling tools: Visual VM or JProfiler
  • Five (5) years of experience with the following: Measuring and monitoring metrics using AppDynamics and Dynatrace
  • Five (5) years of experience with the following: Creating dashboards to enable telemetry and alerts using at least one of the following tools: Splunk, Grafana, AppDynamics or Dynatrace
  • Five (5) years of experience with the following: Building CI/CD pipelines to automate end-to-end load test execution and load testing infrastructure maintenance
  • Five (5) years of experience with the following: Building CI/CD Groovy pipelines to automate load test activities
  • Five (5) years of experience with the following: Implementing GIT and SVN functionalities
  • Five (5) years of experience with the following: Automating extraction of test data
  • Five (5) years of experience with the following: Automating load test result generation and reporting
  • Five (5) years of experience with the following: Analyzing performance test results
  • Five (5) years of experience with the following: Conducting root cause analysis for complex business processes and functionality
  • Five (5) years of experience with the following: Performing performance tuning to optimize garbage collection, heap management, and JVM configuration
  • Five (5) years of experience with the following: Scripting using Python and PowerShell.
  • Three (3) years of experience with the following: Applying site reliability engineering principles and practices across multiple applications
  • Three (3) years of experience with the following: Creating test automation scripts using Selenium
  • Three (3) years of experience with the following: Writing Python utilities to manage services and MQ testing
  • Three (3) years of experience with the following: Performing DB testing and tuning
  • Two (2) years of experience with the following: Translating quantitative information into actionable insights based on Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
  • Two (2) years of experience with the following: Managing patches, upgrades, and maintenance in Linux and Windows infrastructure
  • One (1) year of experience with the following: Conducting chaos testing to assess application resiliency and identify potential issues using Gremlin
  • One (1) year of experience with the following: Defining SLOs and SLIs for applications.

Responsibilities

  • Troubleshoot, performance tune, and engineer both in-house and vendor products related to identity and access management and privileged access management.
  • Run, maintain, and improve technology solutions in line with established service level objectives by applying software engineering principles.
  • Develop tools and visualizations to gain insights into customer experience and product interactions.
  • Collaborate with the development team throughout the software life cycle to build reliable systems and deploy them across different regions.
  • Develop solutions to automate manual development and operational tasks.
  • Ensure the availability, performance, change management, telemetry, and capacity management of technology solutions.
  • Perform root cause analysis and participate in post- mortems to identify and address gaps, enhancing security solutions.
  • Analyze usage and telemetry data to identify patterns, predict, and prevent failures in technology applications and services.
  • Evaluate and test products before and after changes.
  • Divide and allocate manual operational and engineering work.
  • Provide support coverage and troubleshoot time-critical issues.

Benefits

  • comprehensive health care coverage
  • on-site health and wellness centers
  • a retirement savings plan
  • backup childcare
  • tuition reimbursement
  • mental health support
  • financial coaching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service