Application Engineering Technical Lead - III

VanguardMalvern, PA
Hybrid

About The Position

Join a resiliency-focused engineering team that is building practical, production-grade GenAI capabilities to make operations faster, safer, and more predictable. This Technical Lead III role sits at the intersection of Site Reliability Engineering and platform engineering, with significant greenfield work to design, prove, and scale AI-driven approaches to incident response and operational workflows. You will lead hands-on delivery across the full lifecycle: selecting and validating model fit (including evaluation and testing), building AWS AI-enabled platform components (for example, using Amazon Bedrock and Model Context Protocol (MCP) patterns), and integrating those capabilities into real SRE tooling such as runbook automation, remediation workflows, and internal self-service experiences. The goal is simple: measurable reliability outcomes, delivered through well-engineered automation that teams can trust and adopt.

Requirements

  • Minimum of ten years related work experience, with at least five years of development experience.
  • Demonstrated experience leading or significantly influencing reliability/operations outcomes (e.g., incident response, observability, automation, resilience patterns).
  • Undergraduate degree or equivalent combination of training and experience.

Nice To Haves

  • Graduate degree preferred.
  • experience delivering AI‑assisted operational capabilities (e.g., AI‑driven triage suggestions, runbook automation) and familiarity with AWS Bedrock and/or Model Context Protocol (MCP) concepts

Responsibilities

  • Provides senior level Site Reliability Engineering technical lead services and direction for critical reliability, observability, automation, and platform initiatives across multiple platforms.
  • Provides technical expertise and completes complex design, implementation, architecture specification, and maintenance activities.
  • Ensures the viability of IT deliverables.
  • Recommends development and architectural options and approves the team’s technical deliverables.
  • Conducts testing, including functionality, technical limitations, and security.
  • Provides production support for products.
  • Identifies potential solutions and approves technical solutions proposed by team members.
  • Elevates complex technical issues to IT experts.
  • Resolves technical problems discovered by testers and internal clients.
  • Responds to and resolves technical issues in a timely manner.
  • Researches issues and performs root cause analysis.
  • Anticipates technology problems and prevents them.
  • Resolves potential technology problems implemented by less experienced staff before they cause an issue.
  • Communicates with key stakeholders on project issues and implications, including operational risk and reliability tradeoffs.
  • Evaluates the impacts of change requests on technologies and effectively persuades and influences others on ideas.
  • Maintains a current and working knowledge of IT development methodology, architecture design, and technical standards.
  • Mentors IT staff and identifies training needs.
  • As new standards are instituted, ensures their usage by team members.
  • Learns and applies new technology quickly, and is well versed on the latest technologies and tools supporting software development and reliability engineering in the industry.
  • Assists with the creation of strategic product roadmaps, including reliability and automation outcomes.
  • Reviews and approves documentation and diagrams created by IT team members.
  • Writes documentation, including technical standards, processes, runbooks, and operational playbooks.
  • Identifies opportunities for continuous quality improvement of technical standards, methodologies, and technologies, including improvements that reduce operational load and improve service health.
  • Develops code and test artifacts that reuse subroutines or objects, is well structured, backed by automated tests, includes sufficient comments and is easy to maintain.
  • Writes programs, appropriate test artifacts, ad hoc queries, and reports.
  • Employs contemporary software development techniques to ensure tests are implemented in a way that supports automation.
  • Amplifies the productivity of entire team through collaboration and the implementation of technology frameworks, automation patterns, and GenAI‑enabled operational tooling (e.g., incident triage assistance, runbook automation, intelligent alerting).
  • Learns and applies new technology quickly, and is well versed on the latest technologies and tools supporting software development in the industry.
  • Anticipates technology problems and prevents them.
  • Thoroughly understands and complies with Information Technology and Information Security policies and procedures, and verifies that deliverables meet requirements.
  • Works across organizational lines to improve both IT and Information Security policies and procedures.
  • Maintains a broad understanding of the roles of front, middle and back office, and designs systems to enable efficient business processes while maintaining necessary controls.
  • Maintains a broad understanding of the roles of external partners and designs systems to enable efficient business processes while maintaining necessary controls.
  • Participates in special projects and performs other duties as assigned.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service