Application Engineering Technical Lead - III

VanguardMalvern, PA
Hybrid

About The Position

This Technical Lead III role is part of a resiliency-focused engineering team dedicated to building practical, production-grade GenAI capabilities to enhance operational speed, safety, and predictability. The position bridges Site Reliability Engineering and platform engineering, involving significant greenfield work to design, prove, and scale AI-driven approaches for incident response and operational workflows. The role requires leading hands-on delivery across the full lifecycle, including selecting and validating model fit, building AWS AI-enabled platform components (like Amazon Bedrock and Model Context Protocol patterns), and integrating these capabilities into SRE tooling such as runbook automation, remediation workflows, and internal self-service experiences. The ultimate goal is to achieve measurable reliability outcomes through trusted and adoptable, well-engineered automation.

Requirements

  • Minimum of ten years related work experience, with at least five years of development experience.
  • Demonstrated experience leading or significantly influencing reliability/operations outcomes (e.g., incident response, observability, automation, resilience patterns).
  • Undergraduate degree or equivalent combination of training and experience.

Nice To Haves

  • Graduate degree preferred.
  • Experience delivering AI‑assisted operational capabilities (e.g., AI‑driven triage suggestions, runbook automation) and familiarity with AWS Bedrock and/or Model Context Protocol (MCP) concepts.

Responsibilities

  • Provides senior level Site Reliability Engineering technical lead services and direction for critical reliability, observability, automation, and platform initiatives across multiple platforms.
  • Provides technical expertise and completes complex design, implementation, architecture specification, and maintenance activities.
  • Ensures the viability of IT deliverables.
  • Recommends development and architectural options and approves the team’s technical deliverables.
  • Conducts testing, including functionality, technical limitations, and security.
  • Provides production support for products.
  • Identifies potential solutions and approves technical solutions proposed by team members.
  • Elevates complex technical issues to IT experts.
  • Resolves technical problems discovered by testers and internal clients.
  • Responds to and resolves technical issues in a timely manner.
  • Researches issues and performs root cause analysis.
  • Anticipates technology problems and prevents them.
  • Resolves potential technology problems implemented by less experienced staff before they cause an issue.
  • Communicates with key stakeholders on project issues and implications, including operational risk and reliability tradeoffs.
  • Evaluates the impacts of change requests on technologies and effectively persuades and influences others on ideas.
  • Maintains a current and working knowledge of IT development methodology, architecture design, and technical standards.
  • Mentors IT staff and identifies training needs.
  • As new standards are instituted, ensures their usage by team members.
  • Learns and applies new technology quickly, and is well versed on the latest technologies and tools supporting software development and reliability engineering in the industry.
  • Assists with the creation of strategic product roadmaps, including reliability and automation outcomes.
  • Reviews and approves documentation and diagrams created by IT team members.
  • Writes documentation, including technical standards, processes, runbooks, and operational playbooks.
  • Identifies opportunities for continuous quality improvement of technical standards, methodologies, and technologies, including improvements that reduce operational load and improve service health.
  • Develops code and test artifacts that reuse subroutines or objects, is well structured, backed by automated tests, includes sufficient comments and is easy to maintain.
  • Writes programs, appropriate test artifacts, ad hoc queries, and reports.
  • Employs contemporary software development techniques to ensure tests are implemented in a way that supports automation.
  • Amplifies the productivity of entire team through collaboration and the implementation of technology frameworks, automation patterns, and GenAI‑enabled operational tooling (e.g., incident triage assistance, runbook automation, intelligent alerting).
  • Learns and applies new technology quickly, and is well versed on the latest technologies and tools supporting software development in the industry.
  • Anticipates technology problems and prevents them.
  • Thoroughly understands and complies with Information Technology and Information Security policies and procedures, and verifies that deliverables meet requirements.
  • Works across organizational lines to improve both IT and Information Security policies and procedures.
  • Maintains a broad understanding of the roles of front, middle and back office, and designs systems to enable efficient business processes while maintaining necessary controls.
  • Maintains a broad understanding of the roles of external partners and designs systems to enable efficient business processes while maintaining necessary controls.
  • Participates in special projects and performs other duties as assigned.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service