Lead Software Engineer

First Citizens BankRaleigh, NC
21h

About The Position

We are seeking an experienced SRE Lead to build and maintain reliable, scalable infrastructure supporting our data engineering platform. This lead role focuses on ensuring data systems' operational excellence, automation, cost optimization, and disaster recovery while mentoring a growing team.

Requirements

  • Bachelor's Degree and 6 years of experience in Software application development and maintenance OR High School Diploma or GED and 10 years of experience in Software application development and maintenance
  • Snowflake Platform (deep operational knowledge: warehouses, clustering, query optimization, costs)
  • Infrastructure as Code: Terraform, CloudFormation, or similar (AWS-focused preferred)
  • Data orchestration: Airflow, Dagster, dbt Cloud operational patterns
  • Observability tools: Splunk, Dynatrace, Datadog, CloudWatch, Prometheus/Grafana, or equivalent
  • CI/CD & Git workflows: GitHub, GitLab, AZDO or similar
  • AWS services[Data]: EC2, S3, Glue, Lambda, RDS, Lambda, networking, and cost management
  • Linux/Unix system administration and troubleshooting
  • Python or SQL for automation and tooling
  • Incident management and postmortem discipline
  • Systems thinking and holistic problem-solving
  • Strong communication and cross-functional collaboration
  • Technical depth with operational breadth
  • Proactive mindset: anticipate failure, prevent incidents, improve continuously
  • Comfort with on-call responsibilities and urgent troubleshooting
  • Ability to balance automation ROI with immediate operational needs
  • Mentoring and team building capabilities
  • 7+ years in SRE, DevOps, or Infrastructure Engineering
  • 3+ years in a lead or senior technical role
  • 2+ years supporting data platforms or analytics infrastructure (Snowflake, dbt, data warehouses)

Responsibilities

  • Maintain highly available, fault-tolerant data platforms on Snowflake, dbt Cloud, and AWS; establish SLOs/SLAs and implement monitoring to meet them.
  • Own incident response processes, postmortem culture, and continuous improvement; reduce MTTR through root cause analysis and preventative measures.
  • Implement comprehensive monitoring, alerting, and logging across data infrastructure using tools like Splunk, Dynatrace or similar, design dashboards for real-time visibility into system health.
  • Maintain dbt Cloud jobs, Airflow DAGs, and Snowflake performance
  • Design anomaly detection and proactive alerting to prevent data incidents before they impact users.
  • Lead IaC initiatives using Terraform for AWS resources, Snowflake provisioning, and dbt Cloud configuration
  • Manage deployment pipelines, scaling policies, and resource provisioning to reduce manual toil
  • Build self-service tools and runbooks enabling engineers to safely operate infrastructure
  • Conduct regular cost audits; optimize Snowflake warehouse sizing, query performance, and cluster configurations; implement auto-suspend/auto-resume policies.
  • Monitor cloud resource utilization across compute, storage, and data transfer; identify cost-saving opportunities and implement chargeback models.
  • Balance performance with cost through intelligent caching, compression, materialized views, and query optimization recommendations.
  • Enforce RBAC/ABAC policies, network segmentation, and encryption at rest/in-transit; manage secrets, API keys, and credentials
  • Take part in security reviews, penetration testing, and threat modeling; maintain disaster recovery and business continuity plans.
  • Lead and mentor junior SREs; establish technical standards, best practices, and on-call rotations.
  • Drive documentation culture; maintain runbooks, architecture diagrams, and troubleshooting guides for operational knowledge transfer.
  • Collaborate with data engineers on reliability concerns; advise on architecture decisions with production readiness in mind.
  • Forecast infrastructure capacity needs; plan for growth and resource scaling aligned with business requirements.
  • regularly test disaster recovery procedures; maintain backups, perform failover drills, and document recovery time objectives (RTOs) and recovery point objectives (RPOs).

Benefits

  • Benefits are an integral part of total rewards and First Citizens Bank is committed to providing a competitive, thoughtfully designed and quality benefits program to meet the needs of our associates. More information can be found at https://jobs.firstcitizens.com/benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service