SRE Technical Architect

Perficient•Charlotte, NC

27d•$81,978 - $178,090•Remote

About The Position

We currently have a career opportunity for an SRE Architect to join our team located in Ft Mill / Charlotte, NC Area. While the majority of the team is located here, the successful candidate can be located anywhere in the continental US with preference being the EST / CST zones. Job Overview: Join Perficient as an SRE Architect to instrument mission-critical systems, build golden-signal dashboards, and close gaps across our enterprise observability landscape. In this role you will own end-to-end telemetry—from services to front-end—so teams can ship faster and operate with confidence. The SRE Technical Architect provides technology direction, ensures project implementation compliance, and utilizes technology research to innovate, integrate, and manage technology solutions. As a Technology Architect, you will significantly contribute to identifying best-fit architectural solutions for one or more projects; you will collaborate with some of the best talent in the industry to create and implement innovative high quality solutions, participate in Sales and various pursuits focused on our clients' business needs. This role is considered part of the Business Unit Leadership team and may mentor Junior Architects and /or development team members. Perficient is always looking for the best and brightest talent and we need you! We’re a quickly-growing, global digital consulting leader, and we’re transforming the world’s largest enterprises and biggest brands. You’ll work with the latest technologies, expand your skills, and become a part of our global community of talented, diverse, and knowledgeable colleagues.

Requirements

Enterprise production experience with Dynatrace OneAgent or OpenTelemetry (services and user interfaces).
Proven ability to instrument .NET services and Angular front-end applications (RUM, distributed tracing, log correlation).
Hands-on with AWS API Gateway, Kong API, and Kong Mesh (metrics, traces, health checks, and policy events).
Experience instrumenting ForgeRock flows and integrating identity telemetry.
CI/CD proficiency with GitHub Actions—adding quality gates for telemetry, linting configs, and secrets management.
Strong grasp of golden signals, SLI/SLO design, and practical alerting (avoiding alert fatigue).
Ability to work autonomously and drive cross-functional change through clear documentation and backlog tasks.
Solid understanding of distributed systems, HTTP, and cloud networking fundamentals.
Demonstrated ability to leverage AI tools to enhance productivity, streamline workflows, and support data-informed task execution.
Familiarity with AI-enhanced platforms is a plus.
A solid understanding of AI capabilities and limitations including ethical considerations is expected.
Bachelor's degree in computer science or related field of study.

Nice To Haves

Experience with Kubernetes, Prometheus, Grafana, AWS CloudWatch/X-Ray, and log stacks (e.g., OpenTelemetry Collector, OTLP).
Knowledge of service mesh telemetry specifics (sidecar proxies, mTLS, traffic policies).
IaC tooling (e.g., Terraform) for observability resources and dashboards as code.
Prior work with digital identity and ForgeRock admin/SDKs.
Familiarity with Dynatrace (dashboards, Davis AI, topology modeling) or vendor-neutral patterns to avoid lock-in.
Master's degree in computer science or related field of study.

Responsibilities

Define reference architectures for high availability, disaster recovery (DR), multi-region/zone deployments, and fault tolerance.
Architect end-to-end observability: logs, metrics, traces, profiling, and actionable alerting; standardize telemetry schemas and dashboards.
Establish SLOs/SLIs and error budget policies across services; align reliability goals to customer experience and business KPIs.
Lead reliability roadmaps, standards, and guardrails (operational readiness reviews, production readiness checklists).
Implement runbooks, playbooks, and automated diagnostics; enforce alert hygiene (signal/noise, on-call ergonomics).
Define incident response patterns (SEV classification, comms, postmortems, learning reviews) and statistical analysis of incident trends.
Own instrumentation across services and UIs: back-ends and Angular front-end apps, ensuring high-quality traces, metrics, and logs with context propagation.
Instrument gateways and mesh: AWS API Gateway, Kong API Gateway, and Kong Mesh to capture request/response telemetry, service health, and mesh traffic.
Integrate identity flows: ForgeRock (Access Management / Identity) telemetry for authn/authz journeys and error paths.
Automate pipelines: Implement observability steps in GitHub Actions (build, test, deploy) to validate telemetry in CI/CD and block non-compliant releases.
Tooling: Leverage Dynatrace OneAgent or OpenTelemetry (enterprise production experience required) to collect signals, normalize, and ship to approved backends.
Dashboards & golden signals: Create durable dashboards oriented to the four golden signals—latency, traffic, errors, saturation—plus service availability, dependency health, and user experience.
Gap analysis & backlog creation: Identify gaps (e.g., missing SLIs, low trace coverage, noisy logs), write actionable project tasks / stories, and partner with dev, SRE, and security to drive closure.
Reliability practices: Define/maintain SLIs/SLOs, alert thresholds, runbooks, and error budgets; collaborate on incident reviews and trend analysis.
Enablement: Coach product teams on instrumentation patterns, standards, and SDK usage to scale observability autonomously.