Senior Software Engineer I - Technical Operations & Quality Engineering

The New York Public LibraryNew York, NY
23d$125,000 - $130,000Hybrid

About The Position

Overview This role is an onsite role and requires work onsite in the NYC office three days a week. NYPL Digital’s Quality Assurance team is undergoing a transformation from traditional manual testing to a modern technical operations function that enables engineering velocity. We are seeking a Senior Software Engineer with proven expertise in infrastructure automation, monitoring systems, and cross-departmental collaboration to lead critical technical initiatives that will standardize and modernize how our engineering teams work. This role combines hands-on technical work with strategic influence. You'll architect solutions while building bridges between Digital and DevOps teams. You'll also champion the adoption of AI-powered tools and workflows to accelerate development and testing practices. Your work will directly impact how 30+ engineers across multiple departments collaborate, deploy, and monitor production systems. We are looking for someone we can count on to: Own: Infrastructure standardization initiatives across Digital and DevOps Monitoring and observability strategy implementation Technical documentation and training programs Cross-departmental collaboration workflows Teach: Infrastructure-as-Code best practices to engineering teams Monitoring and observability techniques Automation strategies to reduce manual work Technical leadership through example AI-powered development and testing approaches Learn: NYPL's unique technical landscape and organizational dynamics Public sector technology constraints and opportunities Advanced monitoring and infrastructure patterns Improve: Cross-team collaboration between Digital and DevOps Infrastructure reliability and standardization Team technical capabilities through mentorship Operational excellence practices Some expectations for this role are that within: 1 month, this person will: Understand current state of Terraform infrastructure and monitoring tools Build relationships with key stakeholders in Digital and DevOps Identify quick wins for infrastructure and deployment improvements Begin mentoring QA engineers on technical practices 3 months, this person will: Lead Terraform migration and refactoring initiative and establish new infrastructure-as-code workflows Begin consolidating monitoring tools Evaluate and pilot AI-assisted testing and development tools Demonstrate technical leadership to the QA team 6 months and beyond, this person will: Complete infrastructure standardization with full adoption Achieve monitoring consolidation with teams trained on new tools Establish themselves as the go-to expert for operational excellence Show measurable improvements in deployment reliability and team productivity

Requirements

  • Bachelor's degree in Computer Science, Software Engineering, or related field OR 5-7 years of equivalent experience
  • 4-7 years of professional software development experience
  • Strong programming skills in Python and/or other languages
  • Deep understanding of infrastructure-as-code principles and practices
  • Expertise in monitoring, observability, and reliability engineering
  • Experience with AWS and cloud infrastructure
  • Proficiency with CI/CD pipelines and deployment automation
  • Experience with or strong interest in AI-powered development and testing tools
  • Excellent written and verbal communication skills with ability to work across organizational boundaries
  • Systems thinking approach to complex problems with track record of improving operational efficiency
  • Strong project management skills with ability to drive organizational change

Responsibilities

  • Partner with DevOps to migrate and refactor Terraform codebase to establish a single source of truth for infrastructure
  • Lead consolidation of monitoring tools (Prometheus, Zabbix, Uptime.com, CloudWatch, NewRelic, SolarWinds) into a streamlined stack
  • Implement standardized monitoring practices and define key metrics for each team
  • Build and maintain internal tooling to reduce manual workflows and improve operational efficiency
  • Lead evaluation and implementation of AI-assisted testing frameworks and developer tools (GitHub Copilot, etc.)
  • Design AI-powered automation workflows for testing and development processes
  • Create comprehensive documentation and training materials for new tools and practices
  • Build strong working relationships between Digital and DevOps teams
  • Drive adoption of new standards through effective change management
  • Provide hands-on technical mentorship to QA engineers
  • Own the reliability and accuracy of core operational processes
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service