SRE Consultant

NTT DATA Services, LLC
Hybrid

About The Position

Site Reliability Engineering Consultant – New Jersey, Hybrid (2 days/wk) The Site Reliability Engineering Consultant, will be responsible for the development and overall implementation of software in a complex, critical and large cross-departmental and multi-disciplinary area. This role is part of a multi-year transformation journey that will require a successful candidate to establish best practices, motivate and promote a cultural shift that will ensure a successful adoption of Engineering Principles and Practices within Production Management. The role… requires a comprehensive understanding of multiple areas within a function and how they interact to achieve the objectives of the function. applies in-depth understanding of the business impact of technical contributions. is accountable for delivery of a full range of end-to-end projects. requires excellent communication skills required to negotiate internally. involves short- to medium-term planning of actions and resources for own area.

Requirements

  • Relevant experience in a critical software development role with high business impact, ability to understand how software delivers business value
  • Excellent engineering skills and senior architecture
  • Excellent working knowledge of key computer science concepts (networking, operating systems, virtualization, containerization, etc.)
  • Polyglot full-stack developer mentality and ability to pick up new languages and skills
  • Excellent understanding of Software Engineering concepts like Software Development Life Cycle and GitOps
  • Excellent debugging and analytical skills: ability to isolate root cause across networking/infrastructure, application and database stacks
  • Experience of delivering software using Agile delivery methodologies is a must (SCRUM/Kanban)
  • Strong experience with end-to-end observability stacks (Datadog, AppDynamics, Dynatrace, etc.) is desirable
  • Degree in computer science/mathematics/physics or related technical subject is desirable
  • Experience of senior stakeholder management
  • Consistently demonstrates clear and concise written and verbal communication skills
  • 9+ years in a site reliability engineering related role with proven hands-on expertise and the capability to demonstrate technical proficiency in the following:
  • Programming (Java, Python, or equivalent)
  • Containerization
  • Kubernetes
  • GitOps
  • High Availability Systems
  • Infrastructure as a code
  • Configuration Management
  • Observability (tools and implementation)
  • Hyperscale Systems
  • Middleware configuration

Nice To Haves

  • Operational experience of deploying and running services at scale on top of Docker/Kubernetes stack and a service mesh, like Istio, is highly desirable
  • Operational experience with orchestration tools for CI/CD and Infrastructure-as-Code tooling (Terraform, Cloud Formation, etc.) is a highly desirable
  • Operational experience of using middleware technologies (MQ, Apache Kafka, etc.) to run services at scale is desirable

Responsibilities

  • Demonstrate an in-depth understanding of Software Development Lifecycle and how it integrates within the overall technology landscape to deliver scalable, reliable and resilient applications.
  • Ability to operate in a global environment with on-/near-/off-shore matrix reporting structures.
  • Operate into a highly regulated environment that requires in-depth understanding of the regulatory requirements and the industry implications for our technologies.
  • Improve the service level the team provides to our end users, which includes maximizing operational efficiencies, strengthening incident management, problem management and knowledge sharing practices.
  • Drive Continuous Delivery and Automation efforts across the supported applications by means of Root Cause Analysis reviews, Knowledge management, Performance tuning, and user training.
  • Foster a culture that promotes transparency and innovation for increased team productivity.
  • Coach members of the team and outside the immediate reporting line about the best practices and recognize anti-patterns that are quickly addressed.
  • Implement the Agile Framework through one of its implementations like SCRUM or Kanban and ensure it integrates with overall organization processes.
  • Avidly communicate progress and project status across the organization and ensure that stakeholders are managed appropriately throughout the execution period.

Benefits

  • medical, dental, and vision insurance with an employer contribution
  • flexible spending or health savings account
  • life and AD&D insurance
  • short and long term disability coverage
  • paid time off
  • employee assistance
  • participation in a 401k program with company match
  • additional voluntary or legally-required benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service