Lead Systems Engineer

DTCCJersey City, NJ
Hybrid

About The Position

The Lead System Engineer (Windows OS Operations, Private Cloud) is a critical role responsible for the stability, security, and long‑term evolution of the enterprise Windows private cloud platform. This role serves as the technical authority for Windows operating system architecture, ensuring highly available, resilient, and compliant platforms that support mission‑critical business applications. By defining architectural standards, driving modernization initiatives, and embedding operational best practices, the role directly reduces infrastructure risk, improves platform reliability, and enables predictable, high‑quality service delivery within a regulated enterprise environment. In addition, this role shapes the future of Windows operations through automation, standardization, and cross‑functional technical leadership. The Principal Architect influences how Windows platforms are designed, built, secured, and operated, leading efforts across OS lifecycle management, Active Directory and core services architecture, security hardening, capacity planning, and disaster recovery readiness. Acting as a trusted advisor to senior leadership and engineering teams, the role translates business and regulatory requirements into scalable, secure technical solutions, while mentoring senior engineers and establishing reference architectures that drive operational efficiency, cost optimization, and continuous improvement across the Windows private cloud ecosystem.

Requirements

  • Minimum of 6 years of related experience
  • Bachelor's degree preferred or equivalent experience
  • Strong hands-on experience administering enterprise-scale Windows environments, including Windows Server 2016/2019/2022, in large, regulated enterprises.
  • Deep expertise in Windows OS and core platform services, including Active Directory, Group Policy, DNS, DHCP, clustering, networking, storage, and file system management.
  • Proven experience supporting mission-critical, high-availability Windows platforms within regulated and highly controlled environments.
  • Strong knowledge of ITIL processes, including Incident, Problem, Change, and Release Management, with demonstrated operational discipline.
  • Demonstrated leadership in driving and resolving Critical and Major production incidents, coordinating across infrastructure, application, and security teams.
  • Proficiency in automation and scripting using PowerShell, with experience integrating automation into operational workflows (patching, remediation, provisioning, and compliance).
  • Strong understanding of virtualization technologies (VMware ESXi or equivalent) and Windows virtual infrastructure operations.
  • Experience supporting cloud and hybrid environments, preferably AWS, with Windows-based workloads and integrations across on‑prem and cloud platforms.
  • Excellent troubleshooting, analytical, and communication skills, with the ability to lead teams, make sound decisions, and operate effectively under pressure.

Responsibilities

  • Provide advanced Level 2 / Level 3 production support for Windows server environments across the enterprise.
  • Manage patching, upgrades, and lifecycle management of Windows and Linux platforms in alignment with security, compliance, and enterprise standards.
  • Troubleshoot and resolve OS-level, hardware, and platform issues, including CPU, memory, disk, network, storage services
  • Lead or contribute to automation and scripting initiatives using Ansible, PowerShell, Bash, Python, or similar orchestration tools to improve operational efficiency and reduce manual effort.
  • Ensure availability, performance, resilience, and recoverability of production environments through proactive monitoring, maintenance, and capacity planning.
  • Act as a key responder for Critical and Major production incidents, driving end‑to‑end restoration, root cause analysis, remediation, and post‑incident reviews.
  • Support application deployments and collaborate closely with application, database, middleware, storage, network, and security teams.
  • Develop and deliver operational metrics, dashboards, and KPIs across infrastructure platforms, providing actionable insights to leadership.
  • Drive platform standardization, best practices, and continuous improvement initiatives across distributed environments.
  • Serve as a technical escalation point and mentor for engineers across Linux and platform operations teams.

Benefits

  • Competitive compensation, including base pay and annual incentive
  • Comprehensive health and life insurance and well-being benefits, based on location
  • Pension / Retirement benefits
  • Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.
  • DTCC offers a flexible/hybrid model of 3 days onsite and 2 days remote (onsite Tuesdays, Wednesdays and a third day unique to each team or employee).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service