About The Position

Senior Engineer/Platform Leader accountable for designing, building, and operating secure, scalable AI/ML and Generative AI (GenAI) platforms in the cloud. This role develops and maintains reusable platform capabilities (e.g., model and prompt development environments, feature/model/prompt management, retrieval and knowledge-grounding patterns, data access patterns, CI/CD automation, evaluation/testing, and observability) so teams can deliver business outcomes faster while meeting Truist technology standards, security requirements, and regulatory obligations.

Requirements

  • Undergraduate degree in either computer science, analytics, data engineering, finance or equivalent degree.
  • At least 3 years of experience driving enterprise data strategy, data execution, data engineering or software delivery
  • Expert problem solving skills and being able to define detailed strategies
  • Experience in financial services or payments industry
  • Experience in meeting regulatory obligations and operating in a highly regulatory environment on the cloud
  • Experience in building a high performing team

Nice To Haves

  • Master’s degree and/or 8+ years of progressive experience delivering complex cloud platforms, preferably supporting AI/ML or analytics workloads at enterprise scale.
  • Experience building AI/ML platforms and/or MLOps capabilities (e.g., training/inference automation, model packaging and deployment, model registry, experiment tracking, and operational monitoring).
  • Experience with container platforms and orchestration (e.g., Kubernetes/EKS), API enablement, and modern ML tooling (e.g., Python ML ecosystem) to operationalize models and GenAI services.
  • Deep expertise in AWS (compute, networking, security/IAM, logging/monitoring, and managed services) and moderate experience with Azure services and deployment patterns.
  • Hands-on DevOps/DevSecOps experience building CI/CD pipelines (GitLab), including automated testing, security scanning, artifact management, and controlled deployments across environments.
  • Strong infrastructure-as-code experience deploying cloud components using Terraform; ability to build reusable modules and enforce standards/guardrails.
  • Relevant cloud and security certifications (preferred), such as AWS Solutions Architect/DevOps Engineer, AWS Security Specialty, Azure Administrator/Architect, and/or Terraform certification; strong mentoring/coaching skills for engineers distributed across onshore and offshore teams.

Responsibilities

  • Design, build, and execute the AI/ML and GenAI platform strategy aligned to enterprise architecture, security, and risk standards.
  • Own the engineering and lifecycle management of AI/ML platform components (e.g., development workspaces, training/inference patterns, model registry, feature storage patterns, experiment tracking, prompt/version management, retrieval-augmented generation (RAG) enablement, and reusable templates) for safe and deliberate consumption across the organization.
  • Establish and champion DevSecOps practices for platform delivery, including GitLab source control, build automation, and CI/CD pipelines for infrastructure and application deployments.
  • Deploy infrastructure as code (IaC) to the cloud using Terraform modules and pipelines; define standards for environments, networking, identity, secrets, encryption, logging, and configuration management.
  • Partner with Cybersecurity, Risk, and other 2nd line of defense teams to implement and evidence required security controls (e.g., IAM least privilege, network segmentation, encryption, vulnerability management, audit logging, and policy-as-code) across platform services.
  • Implement governance patterns for AI/ML and GenAI (e.g., model and prompt lifecycle controls, lineage/traceability for data, prompts, and outputs, approvals, change management, risk assessments, and operational readiness) consistent with enterprise data governance and regulatory obligations.
  • Provide technical leadership and hands-on engineering to solve complex platform problems (performance, reliability, scalability, cost, and security), and guide engineers through designs, reviews, and delivery.
  • Build platform reliability through automation and observability (monitoring, logging, tracing, SLOs), and partner with production support teams to increase resiliency, reduce toil, and improve time to recover.
  • Enable self-service platform consumption via standardized APIs, reusable pipelines, templates, and documentation; in an Agile environment, may serve as an Agile/DevSecOps champion to accelerate delivery while maintaining compliance.

Benefits

  • All regular teammates (not temporary or contingent workers) working 20 hours or more per week are eligible for benefits, though eligibility for specific benefits may be determined by the division of Truist offering the position.
  • Truist offers medical, dental, vision, life insurance, disability, accidental death and dismemberment, tax-preferred savings accounts, and a 401k plan to teammates.
  • Teammates also receive no less than 10 days of vacation (prorated based on date of hire and by full-time or part-time status) during their first year of employment, along with 10 sick days (also prorated), and paid holidays.
  • Depending on the position and division, this job may also be eligible for Truist’s defined benefit pension plan, restricted stock units, and/or a deferred compensation plan.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service