IT Systems Supervisor

Plymouth Rock Assurance•Boston, MA

39d

About The Position

We are looking for a change agent — not a caretaker. This role demands an automation-first mindset, strong technical depth across Windows and Linux environments, and the leadership presence to raise the bar for how our infrastructure team operates. You will own the health, reliability, and evolution of the firm's compute and M365 environment while developing a high-performing team. Essential Functions and Responsibilities: Compute & M365 Stability and Availability Ensure the reliability, performance, and availability of all compute infrastructure and M365 services — on-prem servers, virtual machines, cloud instances, Exchange Online, Teams, SharePoint, and remote access services Own regular maintenance windows for patching, upgrades, and housekeeping across Windows, Linux, AWS compute, and M365 workloads Establish and enforce operational standards, runbooks, and procedures that create consistency and reduce dependency on tribal knowledge Monitor compute and M365 environment health continuously; act on signals before they become incidents Manage compute capacity across on-prem and AWS — ensuring right-sized, available resources that meet demand without unnecessary cost M365 Administration Own firm-wide M365 administration including Exchange Online, Teams, SharePoint, OneDrive, Entra ID, and associated services Manage tenant configuration, licensing, service health, and policy governance across the M365 platform Partner with the business to understand collaboration needs and translate them into well-governed M365 solutions Maintain a clear operational model for M365 — covering administration, support escalation, and change management Stay current on the M365 roadmap; evaluate new capabilities and drive adoption where they deliver business value Incident Management Lead incident response for P1/P2 compute and M365 events; coordinate across teams and drive to resolution with urgency and clarity Establish and maintain incident runbooks, post-mortems, and lessons-learned processes Track incident trends and use data to drive systemic fixes and reduce repeat events Ensure clear on-call coverage, escalation paths, and communication protocols are in place and understood by the team Vulnerability & Security Operations Own the vulnerability management lifecycle across compute: scanning, prioritization, remediation tracking, and reporting Partner with the Infosec team to align operational processes with security policies, ensuring consistent enforcement across compute and M365 Serve as the operational bridge between infrastructure and security — translating policy requirements into executable team workflows Ensure timely closure of critical and high findings, with clear escalation paths for exceptions and risk acceptance Observability Platform Define, deliver, and own the firm's observability platform for compute and M365 — spanning open source and commercial tooling Architect a unified view of environment health across on-prem, AWS, and M365 (metrics, logs, traces, and events) Establish proactive alerting, dashboards, and runbooks to reduce MTTR Drive adoption of the platform across Windows and Linux teams, ensuring consistent coverage and actionable signal Automation & Tooling Champion an automation-first culture; eliminate manual, repetitive operational tasks through scripting and orchestration (Ansible, PowerShell, Bash, Terraform) Drive infrastructure-as-code adoption for compute provisioning and configuration across on-prem and AWS Leverage M365 automation capabilities — Power Automate, Graph API, and PowerShell — to streamline administration and reduce manual effort Identify and implement tooling to reduce toil and accelerate delivery Technology Evolution & Technical Debt Maintain a living inventory of technical debt across all compute and M365 ownership areas — server platforms, operating systems, virtualization, messaging, collaboration, and remote access Develop and own multi-horizon technology roadmaps that balance operational stability with modernization Make the case for investment: translate technical debt and risk into business impact for leadership Establish a cadence of review and retirement — ensuring aging technologies are actively replaced, not just maintained Champion forward-looking decisions on platform lifecycle, vendor strategy, and architectural direction Cross-Functional Partnership & Technology Strategy Partner with the AppDev team to ensure prompt, reliable delivery of compute services — with clear cost accountability and service-level expectations on both sides Collaborate with FinOps to drive compute and M365 cost optimization, ensuring spend is visible, justified, and continuously improved Partner with Architecture to create, update, and execute against technology roadmaps that align compute and M365 direction with firm-wide strategy Continuously evaluate emerging technologies to reduce risk, lower costs, and improve the reliability, scalability, maintainability, and security of the environment Represent compute and M365 capabilities and constraints in cross-functional planning forums, ensuring operational realities inform strategic decisions Team Leadership & Workload Management Lead and mentor a team of Windows and Linux admins; bridge the gap between both disciplines Manage operational queue: balance incident response, project work, and proactive improvements Drive accountability through sprint planning and ticket hygiene with clear escalation paths Conduct regular 1:1s, performance reviews, and career development conversations Project & Change Accountability Own end-to-end delivery of compute and M365 projects — on scope, on time, with documented outcomes Manage change control processes; reduce risk through peer review and staged rollouts Communicate status, risks, and blockers to leadership proactively

Requirements

5-7 years managing Windows Server and Linux compute infrastructure in enterprise environments
Proven people leadership experience with mixed-skill technical teams
Deep hands-on experience with M365 administration at the enterprise level — Exchange Online, Teams, SharePoint, Entra ID, and tenant governance
Experience managing the transition from or coexistence with on-premises Exchange
Expertise with VMware ESX/vSphere virtualization platforms
AWS compute experience including EC2, Auto Scaling Groups, and WorkSpaces
Experience with VDI/remote access platforms (Citrix, Ivanti Secure)
Demonstrated experience defining and delivering an observability or monitoring platform
Experience owning vulnerability management and patch compliance programs across compute
Track record of developing technology roadmaps and systematically retiring technical debt
Experience partnering with FinOps, AppDev, or Architecture teams in a cross-functional capacity
Strong scripting and automation skills (PowerShell, Graph API, Bash, or equivalent)

Nice To Haves

Experience with infrastructure-as-code tools (Terraform, Ansible, CloudFormation)
Hands-on experience with open source observability tools (e.g. Prometheus, Grafana, OpenTelemetry, ELK)
Familiarity with ITSM and workload management platforms
AWS Solutions Architect or equivalent cloud certification
Microsoft 365 certification (MS-102 or equivalent)
Background in compute capacity planning and performance tuning at scale

Responsibilities

Ensure the reliability, performance, and availability of all compute infrastructure and M365 services
Own regular maintenance windows for patching, upgrades, and housekeeping across Windows, Linux, AWS compute, and M365 workloads
Establish and enforce operational standards, runbooks, and procedures
Monitor compute and M365 environment health continuously
Manage compute capacity across on-prem and AWS
Own firm-wide M365 administration
Manage tenant configuration, licensing, service health, and policy governance across the M365 platform
Partner with the business to understand collaboration needs and translate them into well-governed M365 solutions
Maintain a clear operational model for M365
Stay current on the M365 roadmap
Lead incident response for P1/P2 compute and M365 events
Establish and maintain incident runbooks, post-mortems, and lessons-learned processes
Track incident trends and use data to drive systemic fixes and reduce repeat events
Own the vulnerability management lifecycle across compute
Partner with the Infosec team to align operational processes with security policies
Serve as the operational bridge between infrastructure and security
Define, deliver, and own the firm's observability platform for compute and M365
Architect a unified view of environment health across on-prem, AWS, and M365
Establish proactive alerting, dashboards, and runbooks to reduce MTTR
Champion an automation-first culture
Drive infrastructure-as-code adoption for compute provisioning and configuration across on-prem and AWS
Leverage M365 automation capabilities
Maintain a living inventory of technical debt across all compute and M365 ownership areas
Develop and own multi-horizon technology roadmaps that balance operational stability with modernization
Make the case for investment
Establish a cadence of review and retirement
Champion forward-looking decisions on platform lifecycle, vendor strategy, and architectural direction
Partner with the AppDev team to ensure prompt, reliable delivery of compute services
Collaborate with FinOps to drive compute and M365 cost optimization
Partner with Architecture to create, update, and execute against technology roadmaps
Continuously evaluate emerging technologies to reduce risk, lower costs, and improve the reliability, scalability, maintainability, and security of the environment
Represent compute and M365 capabilities and constraints in cross-functional planning forums
Lead and mentor a team of Windows and Linux admins
Manage operational queue
Drive accountability through sprint planning and ticket hygiene with clear escalation paths
Conduct regular 1:1s, performance reviews, and career development conversations
Own end-to-end delivery of compute and M365 projects
Manage change control processes
Communicate status, risks, and blockers to leadership proactively