IT Systems Supervisor

Plymouth Rock AssuranceBoston, MA

About The Position

We are looking for a change agent — not a caretaker. This role demands an automation-first mindset, strong technical depth across Windows and Linux environments, and the leadership presence to raise the bar for how our infrastructure team operates. You will own the health, reliability, and evolution of the firm's compute and M365 environment while developing a high-performing team. Essential Functions and Responsibilities: Compute & M365 Stability and Availability Ensure the reliability, performance, and availability of all compute infrastructure and M365 services — on-prem servers, virtual machines, cloud instances, Exchange Online, Teams, SharePoint, and remote access services Own regular maintenance windows for patching, upgrades, and housekeeping across Windows, Linux, AWS compute, and M365 workloads Establish and enforce operational standards, runbooks, and procedures that create consistency and reduce dependency on tribal knowledge Monitor compute and M365 environment health continuously; act on signals before they become incidents Manage compute capacity across on-prem and AWS — ensuring right-sized, available resources that meet demand without unnecessary cost M365 Administration Own firm-wide M365 administration including Exchange Online, Teams, SharePoint, OneDrive, Entra ID, and associated services Manage tenant configuration, licensing, service health, and policy governance across the M365 platform Partner with the business to understand collaboration needs and translate them into well-governed M365 solutions Maintain a clear operational model for M365 — covering administration, support escalation, and change management Stay current on the M365 roadmap; evaluate new capabilities and drive adoption where they deliver business value Incident Management Lead incident response for P1/P2 compute and M365 events; coordinate across teams and drive to resolution with urgency and clarity Establish and maintain incident runbooks, post-mortems, and lessons-learned processes Track incident trends and use data to drive systemic fixes and reduce repeat events Ensure clear on-call coverage, escalation paths, and communication protocols are in place and understood by the team Vulnerability & Security Operations Own the vulnerability management lifecycle across compute: scanning, prioritization, remediation tracking, and reporting Partner with the Infosec team to align operational processes with security policies, ensuring consistent enforcement across compute and M365 Serve as the operational bridge between infrastructure and security — translating policy requirements into executable team workflows Ensure timely closure of critical and high findings, with clear escalation paths for exceptions and risk acceptance Observability Platform Define, deliver, and own the firm's observability platform for compute and M365 — spanning open source and commercial tooling Architect a unified view of environment health across on-prem, AWS, and M365 (metrics, logs, traces, and events) Establish proactive alerting, dashboards, and runbooks to reduce MTTR Drive adoption of the platform across Windows and Linux teams, ensuring consistent coverage and actionable signal Automation & Tooling Champion an automation-first culture; eliminate manual, repetitive operational tasks through scripting and orchestration (Ansible, PowerShell, Bash, Terraform) Drive infrastructure-as-code adoption for compute provisioning and configuration across on-prem and AWS Leverage M365 automation capabilities — Power Automate, Graph API, and PowerShell — to streamline administration and reduce manual effort Identify and implement tooling to reduce toil and accelerate delivery Technology Evolution & Technical Debt Maintain a living inventory of technical debt across all compute and M365 ownership areas — server platforms, operating systems, virtualization, messaging, collaboration, and remote access Develop and own multi-horizon technology roadmaps that balance operational stability with modernization Make the case for investment: translate technical debt and risk into business impact for leadership Establish a cadence of review and retirement — ensuring aging technologies are actively replaced, not just maintained Champion forward-looking decisions on platform lifecycle, vendor strategy, and architectural direction Cross-Functional Partnership & Technology Strategy Partner with the AppDev team to ensure prompt, reliable delivery of compute services — with clear cost accountability and service-level expectations on both sides Collaborate with FinOps to drive compute and M365 cost optimization, ensuring spend is visible, justified, and continuously improved Partner with Architecture to create, update, and execute against technology roadmaps that align compute and M365 direction with firm-wide strategy Continuously evaluate emerging technologies to reduce risk, lower costs, and improve the reliability, scalability, maintainability, and security of the environment Represent compute and M365 capabilities and constraints in cross-functional planning forums, ensuring operational realities inform strategic decisions Team Leadership & Workload Management Lead and mentor a team of Windows and Linux admins; bridge the gap between both disciplines Manage operational queue: balance incident response, project work, and proactive improvements Drive accountability through sprint planning and ticket hygiene with clear escalation paths Conduct regular 1:1s, performance reviews, and career development conversations Project & Change Accountability Own end-to-end delivery of compute and M365 projects — on scope, on time, with documented outcomes Manage change control processes; reduce risk through peer review and staged rollouts Communicate status, risks, and blockers to leadership proactively

Requirements

  • 5-7 years managing Windows Server and Linux compute infrastructure in enterprise environments
  • Proven people leadership experience with mixed-skill technical teams
  • Deep hands-on experience with M365 administration at the enterprise level — Exchange Online, Teams, SharePoint, Entra ID, and tenant governance
  • Experience managing the transition from or coexistence with on-premises Exchange
  • Expertise with VMware ESX/vSphere virtualization platforms
  • AWS compute experience including EC2, Auto Scaling Groups, and WorkSpaces
  • Experience with VDI/remote access platforms (Citrix, Ivanti Secure)
  • Demonstrated experience defining and delivering an observability or monitoring platform
  • Experience owning vulnerability management and patch compliance programs across compute
  • Track record of developing technology roadmaps and systematically retiring technical debt
  • Experience partnering with FinOps, AppDev, or Architecture teams in a cross-functional capacity
  • Strong scripting and automation skills (PowerShell, Graph API, Bash, or equivalent)

Nice To Haves

  • Experience with infrastructure-as-code tools (Terraform, Ansible, CloudFormation)
  • Hands-on experience with open source observability tools (e.g. Prometheus, Grafana, OpenTelemetry, ELK)
  • Familiarity with ITSM and workload management platforms
  • AWS Solutions Architect or equivalent cloud certification
  • Microsoft 365 certification (MS-102 or equivalent)
  • Background in compute capacity planning and performance tuning at scale

Responsibilities

  • Ensure the reliability, performance, and availability of all compute infrastructure and M365 services
  • Own regular maintenance windows for patching, upgrades, and housekeeping across Windows, Linux, AWS compute, and M365 workloads
  • Establish and enforce operational standards, runbooks, and procedures
  • Monitor compute and M365 environment health continuously
  • Manage compute capacity across on-prem and AWS
  • Own firm-wide M365 administration
  • Manage tenant configuration, licensing, service health, and policy governance across the M365 platform
  • Partner with the business to understand collaboration needs and translate them into well-governed M365 solutions
  • Maintain a clear operational model for M365
  • Stay current on the M365 roadmap
  • Lead incident response for P1/P2 compute and M365 events
  • Establish and maintain incident runbooks, post-mortems, and lessons-learned processes
  • Track incident trends and use data to drive systemic fixes and reduce repeat events
  • Own the vulnerability management lifecycle across compute
  • Partner with the Infosec team to align operational processes with security policies
  • Serve as the operational bridge between infrastructure and security
  • Define, deliver, and own the firm's observability platform for compute and M365
  • Architect a unified view of environment health across on-prem, AWS, and M365
  • Establish proactive alerting, dashboards, and runbooks to reduce MTTR
  • Champion an automation-first culture
  • Drive infrastructure-as-code adoption for compute provisioning and configuration across on-prem and AWS
  • Leverage M365 automation capabilities
  • Maintain a living inventory of technical debt across all compute and M365 ownership areas
  • Develop and own multi-horizon technology roadmaps that balance operational stability with modernization
  • Make the case for investment
  • Establish a cadence of review and retirement
  • Champion forward-looking decisions on platform lifecycle, vendor strategy, and architectural direction
  • Partner with the AppDev team to ensure prompt, reliable delivery of compute services
  • Collaborate with FinOps to drive compute and M365 cost optimization
  • Partner with Architecture to create, update, and execute against technology roadmaps
  • Continuously evaluate emerging technologies to reduce risk, lower costs, and improve the reliability, scalability, maintainability, and security of the environment
  • Represent compute and M365 capabilities and constraints in cross-functional planning forums
  • Lead and mentor a team of Windows and Linux admins
  • Manage operational queue
  • Drive accountability through sprint planning and ticket hygiene with clear escalation paths
  • Conduct regular 1:1s, performance reviews, and career development conversations
  • Own end-to-end delivery of compute and M365 projects
  • Manage change control processes
  • Communicate status, risks, and blockers to leadership proactively
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service