Senior Site Reliability Engineer

AssetMark Financial Holdings•Charlotte, NC

57d•$140,000 - $180,000•Hybrid

About The Position

AssetMark is a leading strategic provider of innovative investment and consulting solutions serving independent financial advisors. We provide investment, relationship, and practice management solutions that advisors use in helping clients achieve wealth, independence, and purpose. The Opportunity We are seeking a Site Reliability Engineer (SRE) to join our Charlotte-based engineering team. This role sits at the center of platform resilience — ensuring high availability, performance, recoverability, and operational maturity across AssetMark’s production systems. This is not a traditional operations role. Our SREs are engineers first: designing automation, building observability frameworks, improving deployment safety, defining reliability standards, and reducing operational toil through code. You will influence architectural decisions, strengthen incident management practices, and raise the reliability bar across both legacy and cloud-native systems. You will work on systems that operate 24/7, support financial transactions and advisor workflows, and must meet strict regulatory and security requirements. The right candidate is energized by complex distributed systems, high-stakes production environments, and the responsibility of building durable, scalable financial infrastructure. At AssetMark, reliability is a first-order expression of client obsession. Our SRE team plays a critical role in delivering the consistent, trusted technology experience that advisors depend on to run their businesses. We can only consider candidates for this position who are able to accommodate a hybrid work schedule and are close to our Charlotte, NC office.

Requirements

Strong software engineering skills in .NET / C# (or Python, Java, or similar)
Experience operating distributed systems in production
Deep understanding of SRE principles: SLIs/SLOs, error budgets, toil reduction, incident management
Experience with Azure (or AWS/GCP), including compute, networking, and managed services
Knowledge of containerization and orchestration (Docker, Kubernetes preferred)
Experience with monitoring, logging, tracing, and alerting tools
Familiarity with CI/CD pipelines, automation, and Infrastructure-as-Code
Understanding of security best practices in regulated enterprise environments
Bachelor's degree in computer science, Software Engineering, or related technical field
7-10 years of software engineering experience of experience in Site Reliability Engineering, DevOps, Platform Engineering, or production operations
Proven experience in troubleshooting and improving production system reliability
Experience supporting 24/7 systems, batch processing, and mission-critical workloads
Strong collaboration skills across engineering, security, and infrastructure teams
Experience working in Agile/Scrum environments
Experience building APIs, services, and/or platform components
Understanding of enterprise integration patterns, service-oriented architecture, and large-scale system design
Experience with DevOps practices, cross-functional collaboration, and agile/scrum development methodologies

Nice To Haves

Experience supporting financial services or highly regulated systems (preferred)

Responsibilities

Design, implement, and continuously improve the reliability, availability, and performance of critical AssetMark systems (batch, APIs, integrations, and customer-facing platforms)
Define and operationalize SLIs, SLOs, and error budgets for critical services in partnership with engineering and product teams
Participate in on-call rotations, incident response, and major incident management
Lead and contribute to blameless post-incident reviews, driving root cause analysis and measurable reliability improvements
Proactively identify reliability risks and lead remediation efforts before they impact clients
Build and maintain end-to-end observability across applications, infrastructure, and integrations (metrics, logs, traces, alerts)
Implement actionable monitoring and alerting to reduce noise and improve signal quality
Partner with application teams to instrument services using best-in-class observability practices
Ensure visibility into system health, capacity, performance, and failure modes across environments
Identify repetitive operational tasks and automate them through code
Improve deployment reliability through automation, self-service tooling, and safe rollout patterns
Reduce manual intervention in batch processing, integrations, and operational workflows
Apply Infrastructure-as-Code and configuration automation to improve consistency and repeatability
Support reliability of Azure-based infrastructure, containerized workloads, and hybrid environments
Partner with platform, DevOps, and infrastructure teams to improve resilience, scalability, and recovery
Contribute to capacity planning, performance tuning, and cost-aware reliability decisions
Ensure systems meet RTO/RPO, backup, and disaster recovery expectations
Embed security, compliance, and risk controls into operational practices
Work closely with Security and Compliance teams to meet financial services regulatory requirements
Ensure production systems follow least privilege, secure configuration, and auditability standards
Support vulnerability remediation and secure operational processes
Partner with application engineering teams to improve production readiness and operational maturity
Influence system design by advocating for reliability-first architectural decisions
Provide guidance on operational best practices, deployment safety, and observability standards
Document operational patterns, runbooks, and reliability guidelines in Confluence
Act as a reliability advocate across AssetMark engineering teams