Associate Site Reliability Engineer (SRE)

Exegy•St. Louis, MO

19h

About The Position

Exegy is seeking an Associate Site Reliability Engineer (SRE) to help support the reliability, scalability, and performance of our global data center and hybrid infrastructure environments. This role is an excellent opportunity to build foundational skills in systems engineering, automation, and operational rigor while supporting Exegy's mission-critical market data products.

Requirements

Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent practical experience (recent graduates are encouraged to apply)
2–4 years of experience or relevant internships in Systems Administration, IT Operations, or Site Reliability Engineering
Foundational understanding of Linux and/or Windows server administration
Basic exposure to or familiarity with scripting languages (e.g., PowerShell, Bash, Python)
Understanding of fundamental networking concepts and data center operations
Familiarity with basic monitoring and alerting concepts
Strong willingness to learn new technologies (like VMware, AWS/Azure/GCP, and IaC tools) and grow within the SRE field
Ability to participate in an on-call rotation with a collaborative, team-first mindset

Responsibilities

Maintain operational processes, utilize automation tools, monitor system health, and help troubleshoot failures.
Collaborate with Infrastructure, Network Engineering, Security, and DevOps teams to learn and apply best practices for delivering resilient and secure platforms.
Provision and deprovision users across Active Directory, Okta, and Exchange.
Manage access requests and permissions across on-prem and cloud systems.
Troubleshoot Tier 1/2 support issues across hardware, software, connectivity, and accounts.
Support device provisioning, configuration, and onboarding.
Maintain email distribution lists and shared mailboxes.
Provide general office and A/V technology support.
Assist in maintaining uptime across core systems, including compute, storage, virtualization, and network infrastructure.
Provide day-to-day support for production services across on-prem data centers, co-locations, and hybrid cloud environments.
Participate in a 24×7 on-call rotation (with the support and escalation path of senior engineers) and assist in major incident response.
Shadow and assist senior team members in root cause analysis (RCA) and post mortem documentation.
Help maintain and execute existing automation scripts using tools like Ansible, Terraform, PowerShell, Python, or Puppet.
Learn to automate routine operational workflows, configuration management tasks, and deployments.
Assist in managing Infrastructure-as-Code (IaC) templates under the guidance of senior SREs.
Monitor systems, networks, and applications using platforms such as Prometheus, Grafana, Datadog, New Relic, SolarWinds, or Splunk.
Respond to system alerts, update health dashboards, and perform routine log analysis.
Assist in conducting proactive performance health checks across hardware, virtualization, and OS layers (Windows/Linux).
Assist with the management of physical and virtual data center infrastructure, including basic hardware lifecycle tasks and capacity monitoring.
Perform scheduled patching, firmware upgrades, and routine system maintenance.
Participate in Disaster Recovery/Business Continuity (DR/BCP) testing alongside senior engineers.
Help maintain secure baseline configurations aligned to industry standards.
Partner with Network, Security, DevOps, and Application Engineering teams to execute cross-departmental tasks.
Actively participate in architecture and scaling discussions to learn system design principles.
Assist with basic office desktop support; A/V systems, employee onboarding, etc.
Create, update, and follow operational runbooks, playbooks, and standard operating procedures (SOPs).
Follow and support security controls, including MFA, access management, and logging protocols.
Assist in gathering evidence and generating reports for audit requirements (SOC 2, ISO 27001).
Apply patches and updates as part of routine vulnerability remediation efforts.