Director of Platform Engineering

Ncontracts-posted about 2 months ago

$180,000 - $230,000/Yr

Full-time • Director

Remote • Brentwood, TN

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

Reporting to the CTO, we are looking for an experienced Director of Platform Engineering to lead our cloud infrastructure, site reliability engineering (SRE), and DevOps enablement initiatives across the organization. This role requires a forward-thinking leader who leverages AI to transform platform operations and infrastructure management The Director of Platform Engineering is responsible for overseeing the reliability, scalability, and security of our multi-cloud infrastructure while enabling development teams to deploy efficiently and safely. You will pioneer the integration of AI-driven automation, intelligent observability, and predictive infrastructure management to optimize our platform operations. This strategic position works directly within the Engineering organization and with Product, Security, and Development teams to ensure maximum system uptime, optimize cloud costs, and accelerate our software delivery capabilities through innovative use of AI and automation.

Lead and mentor a team of SRE engineers, DevOps engineers, and cloud infrastructure specialists while fostering a culture of AI-augmented operations
Define and execute the cloud strategy, including multi-cloud architecture, migration plans, and integrating AI-powered tools for infrastructure optimization, cost management, and capacity planning
Leverage AI and machine learning for predictive incident detection, automated remediation, and intelligent alerting to enhance system reliability
Establish and maintain SLOs, SLIs, and error budgets while using AI-driven analytics to identify patterns and prevent issues before they impact users, while driving a culture of reliability across engineering
Build and optimize CI/CD pipelines, infrastructure-as-code frameworks, and deployment automation
Manage cloud costs and implement FinOps practices to maximize ROI on cloud investments
Implement AI-powered infrastructure-as-code frameworks, automated deployment pipelines, and intelligent resource allocation to enable safe, rapid releases
Architect and maintain infrastructure supporting our AI-powered technology ecosystem, including LLM integrations, agentic workflows, containerized applications, message queuing, data pipelines, and storage systems
Partner with Security teams to ensure compliance, implement security best practices, and maintain SOC2/ISO certifications
Drive incident response processes, post-mortem culture, and continuous improvement in system reliability, DR/BCP.
Establish and maintain disaster recovery, business continuity, and backup strategies with documented runbooks and tested procedures
Stay at the forefront of AI innovations in platform engineering, evaluating and implementing emerging tools for AIOps, intelligent automation, and infrastructure optimization
Evangelize DevOps and SRE best practices across development teams to enable self-service capabilities

10+ years of experience in cloud infrastructure, SRE, or DevOps roles, with 3+ years in leadership positions
Demonstrated experience leveraging AI/ML tools for infrastructure automation, observability, incident management, or platform optimization
Deep expertise with major cloud platforms (AWS, Azure, and/or GCP) and cloud-native architectures, inlcuding cloud storage solutions (Azure Blob Storage, S3) and data archtiectures
Proven track record of managing and scaling SRE/DevOps teams in high-growth technology environments
Hands-on expertise with containerization technologies, particularly Azure Kubernetes Apps, Docker, infrastructure-as-code (Terraform, CloudFormation), and observability tools (Dynatrace, Datadog, Prometheus, Grafana)
Experience implementing CI/CD pipelines, GitOps workflows, and automated deployment strategies (Azure DevOps, Github Actions)
Experience with message queuing systems (RabbitMQ, Kafka, Azure Service Bus, SQS)
Experience with data platforms and ETL tools, including Snowflake and Azure Data Factory
Strong knowledge of security best practices and compliance standards (OWASP, SOC 2, IAM, Secrets Management, Certificate Management), including AI security considerations
Demonstrated ability to balance system reliability with development velocity and business needs
Experience with performance tuning and scalability optimization for high-traffic applications
Excellent communication skills with the ability to influence technical and non-technical stakeholders
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

Experience integrating and managing AI services and APIs (OpenAI, Claude, or similar) within production infrastructure
Experience building or managing infrastructure for AI/ML model training, inference, and deployment at scale
Hands-on experience with Redis in high-throughput, horizontally scaled deployments
Hands-on experience with prompt engineering, RAG systems, or fine-tuning LLMs for operational use cases
Experience with in-memory analytics databases (DuckDB or similar)
Experience with distributed search and analytics platforms (Elastic or similar)
Experience with TeamCity and Octopus Deploy
Experience with microservices architecture and RESTful API design patterns
Experience with database optimization and data modeling for both SQL and NoSQL systems
Experience with version control workflows (Git flow, trunk-based development)
Experience with technical documentation and knowledge sharing within cross-functional teams
Experience with agile methodologies (Scrum, Kanban) and project management tools (Jira, Confluence)

A fun, fast-paced work environment
Responsible PTO Plan that meets or exceeds state and local medical and family leave laws
11 paid holidays
Community and social events to keep you connected and engaged
Mental Health Benefits
Medical, Dental and Vision insurance
Company-paid Group Life Insurance, Short- and Long-Term Disability
Flexible Spending Account & Health Savings Account
Aflac Benefits – Critical Illness, Cancer Protection, & Hospital Choice
Pet Insurance
401 (k) with company match with eligibility on Day 1 of employment
2 Paid Volunteer Time Off Days
And much more!

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Director of Engineering Resume Examples

•

Director of Engineering Cover Letter Examples

Director of Platform Engineering

Job Search Resources

Tools

Career Hubs

Guides

Company