Solutions Architect, Technology Operations & Service Delivery

AntaresChicago, IL
$160,000 - $225,000Hybrid

About The Position

Antares Capital is seeking a Vice President, Solutions Architect - Technology Operations & Service Delivery to lead and evolve our Production Support Organization, spanning Level 1 and Level 2 operations. This is a strategic, hands-on leadership role responsible for ensuring the reliability, stability, and performance of our technology platforms while building a world-class operational structure driven by automation, data, and continuous improvement. You will own the end-to-end health of our production ecosystem — partnering with Engineering, Infrastructure, Cybersecurity, and Business stakeholders to ensure our systems are resilient, observable, and scalable. You will establish strong KPI frameworks, drive automation-first thinking, manage third-party vendor relationships, and act as the command center during incidents and outages. This role is ideal for a leader who excels at translating operational discipline into strategic execution plans and seeing them through to closure.

Requirements

  • 10+ years of technology operations experience, including at least 5 years leading L1 and/or L2 production support teams in a complex, multi-platform financial services or enterprise environment.
  • Demonstrated ability to design and operate KPI/SLA frameworks that drive measurable improvements in system availability, MTTR, and incident reduction.
  • Proven track record managing third-party technology vendors and managed service providers, including SLA enforcement, contractual accountability, and integration of vendor support workflows into internal ITSM processes.
  • Hands-on experience as incident commander during major outages: able to quickly assess system dependencies, mobilize the right teams, communicate clearly to stakeholders, and drive structured post-incident reviews with accountable remediation plans.
  • Experience building automation-driven support operations using scripting, workflow tools (e.g., Power Automate, ServiceNow workflows), and AI/ML-assisted triage.
  • Awareness of applied AI capabilities relevant to IT operations (AIOps).
  • Strong execution discipline — able to author detailed operational and project plans, establish milestones and owners, and drive completion with high accountability.
  • Hands-on experience configuring and reporting in ServiceNow (incident, change, problem management).
  • Experience with Control-M or equivalent job scheduling and orchestration platforms.
  • Strong proficiency with Datadog or similar observability platforms for logging, monitoring, and alerting.
  • Experience with Azure Cloud services and cloud operations fundamentals.
  • Solid understanding of distributed systems, fault-tolerant design patterns, and system dependency mapping.
  • Ability to quickly construct and communicate application topology during high-pressure outage scenarios.
  • Excellent stakeholder communication and relationship management skills; able to operate with credibility at the executive level while maintaining close working relationships with Engineering, Infrastructure, and Cybersecurity peers.
  • Must have unrestricted authorization to work in the United States.
  • Must be willing to comply with pre-employment screening, including but not limited to drug testing, reference verification, and background check.
  • Must be willing to work from the Chicago or New York office.

Nice To Haves

  • Experience in financial services or similarly regulated industries is strongly preferred.
  • Familiarity with private credit, loan administration, or fund accounting technology platforms is a plus.

Responsibilities

  • Build, lead, and mature both L1 and L2 support teams, establishing clear escalation paths, ownership accountability, and a high-performance service culture.
  • Own end-to-end incident monitoring, triage, and resolution across all production environments.
  • Design and operationalize a comprehensive KPI and SLA/SLO framework covering incident volume, MTTR, MTTD, first-call resolution, and system availability.
  • Present regular metrics and trending analysis to Technology leadership.
  • Serve as the executive incident commander during major outages — rapidly mobilizing cross-functional teams, managing communications, and driving root cause analysis.
  • Develop and execute structured remediation plans that address recurring patterns and systemic dependencies to prevent recurrence.
  • Champion an automation-first culture by identifying and implementing AI-assisted and scripted solutions to reduce manual toil, accelerate resolution, and improve observability.
  • Maintain awareness of emerging Web and AI technologies and assess their applicability to production operations.
  • Own relationships with key technology vendors and managed service providers.
  • Integrate external support processes seamlessly with Antares’ internal workflows, hold vendors accountable to contractual SLAs, and lead escalations to vendor executive teams when needed.
  • Build trusted partnerships with Development, Infrastructure, and Cybersecurity teams to align on release readiness, change management, security controls, and platform health.
  • Act as the connective tissue between operational teams and strategic technology initiatives.
  • Translate operational gaps and technology opportunities into actionable roadmaps and execution plans.
  • Own program milestones end-to-end, driving accountability across teams and delivering measurable outcomes on schedule.
  • Collaborate with Engineering teams to ensure fault-tolerant system design meeting RTO/RPO targets.
  • Oversee replay and recovery capabilities for critical business processes, and maintain runbook documentation to reduce dependency on tribal knowledge.
  • Partner with Dev and Infrastructure teams to ensure systems scale for business growth, handle peak volumes, and employ autoscaling for cost efficiency.
  • Drive adoption of Datadog or equivalent platforms for end-to-end tracing, proactive alerting, and fast root-cause identification.
  • Configure and manage ServiceNow for incident, change, and problem management.
  • Leverage Control-M or equivalent schedulers to maintain job chain integrity and proactively identify scheduling risks.

Benefits

  • medical
  • dental
  • vision coverage
  • employer paid short & long-term disability and life insurance
  • 401(k)
  • profit sharing
  • paid time off
  • Maven family & fertility benefit
  • parental leave (including adoption, surrogacy, and foster placement)
  • other voluntary benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service