Site Reliability Engineer

Armor Defense Inc•Plano, TX

55d•Hybrid

About The Position

The Site Reliability Engineer reports to the Manager, SRE & Platform Engineering, and contributes to the reliability, availability, and performance of Armor’s production infrastructure. This position operates across hybrid cloud environments, including private cloud, public cloud, and virtualization platforms. The role requires the application of independent judgment and discretion in diagnosing infrastructure issues, building automation, and supporting incident response. This is a hands-on engineering role for someone early in their SRE or infrastructure career who learns quickly, automates by instinct, and wants to grow into a senior infrastructure engineer at a security company. This role operates in a hybrid structure with on-site presence three days a week, specifically Tuesday, Wednesday, and Thursday, based in the Plano, Texas area

Requirements

2-4 years of experience in SRE, DevOps, systems administration, or infrastructure engineering.
Working knowledge of Linux and Windows system administration in production environments.
Familiarity with at least one public cloud platform (AWS, Azure, or GCP).
Exposure to containerization and Kubernetes concepts; hands-on experience preferred but not required.
Scripting ability in at least one of Python, PowerShell, or Bash, with a willingness to develop proficiency across all three.
Familiarity with infrastructure-as-code concepts (Terraform, Ansible, or equivalent); production experience preferred but not required.
Familiarity with monitoring and observability tools (Prometheus, Grafana, Datadog, ELK, or equivalent).
Basic networking knowledge, including DNS, firewalls, and load balancing.
Familiarity with version control systems (Git) and CI/CD concepts.
Willingness and demonstrated ability to learn new technologies quickly. Armor’s infrastructure includes VMware vSphere, Proxmox, NSX-T, Zerto, Rubrik, Entra ID, and Active Directory. Proficiency across these platforms is expected to develop within the first year.
Familiarity with security and compliance frameworks (PCI-DSS, HIPAA, SOC 2) is a plus; willingness to learn is required.
Proficiency with AI-assisted development tools (Claude Code, GitHub Copilot, or equivalent) and the ability to evaluate the accuracy of AI-generated outputs.
Understanding of AI/LLM security risks, including prompt injection, data leakage, and model limitations.
Ability to critically evaluate AI-generated outputs for accuracy and security implications.
Strong troubleshooting instincts and a bias toward automation over manual processes.
Strong written and verbal communication skills.
Bachelor’s degree in Computer Science, Information Technology, or equivalent professional experience.

Nice To Haves

Hands-on experience with containerization and Kubernetes.
Production experience with infrastructure-as-code (Terraform, Ansible, or equivalent).
Familiarity with security and compliance frameworks (PCI-DSS, HIPAA, SOC 2).

Responsibilities

Administer and maintain production infrastructure across VMware vSphere, Proxmox, and NSX-T environments under the guidance of senior engineers.
Support Microsoft Entra ID, on-premises Active Directory, and Office 365 administration, including Exchange Online, SharePoint, Teams, and Intune.
Assist in managing workloads across VMware, AWS, Azure, and OCI environments.
Write automation scripts (Python, PowerShell, Bash) to reduce manual operational tasks, improve monitoring, and support self-healing infrastructure.
Contribute to infrastructure-as-code using Terraform, with a focus on learning modularity and reusable patterns.
Build and maintain dashboards and alerting using monitoring platforms such as Datadog, Prometheus, Grafana, or Splunk.
Support disaster recovery operations, including Zerto replication and Rubrik backup infrastructure.
Participate in incident response, root cause analysis, and blameless postmortems to drive continuous improvement.
Participate in on-call rotations with mentorship and escalation support from senior team members.
Assist with vulnerability and patch management processes through automation.
Collaborate with engineering, product, and security teams on platform stability and production readiness.

Benefits

Commitment to Growth: A growth mindset that encourages continuous learning and improvement with adaptability in the face of challenges.
Integrity Always: Sustain trust through transparency + honesty in all actions and interactions regardless of circumstances.
Empathy In Action: Active understanding, compassion, and support to the needs of others through genuine connection.
Immediate Impact: Taking initiative with swift, informed actions to deliver positive outcomes.
Follow-Through: Dedication to delivering finished results with attention to quality and detail to achieve the desired outcomes.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume