Observability Engineer

Vantor•McLean, VA

18d•$137,000 - $228,000•Onsite

About The Position

Vantor is forging the new frontier of spatial intelligence, helping decision makers and operators navigate what’s happening now and shape what’s coming next. Vantor is a place for problem solvers, changemakers, and go-getters—where people are working together to help our customers see the world differently, and in doing so, be seen differently. Come be part of a mission, not just a job, where you can: Shape your own future, build the next big thing, and change the world. To be eligible for this position, you must be a U.S. Person, defined as a U.S. citizen, permanent resident, Asylee, or Refugee. Note on Cleared Roles: If this position requires an active U.S. Government security clearance, applicants who do not currently hold the required clearance will not be eligible for consideration. Employment for cleared roles is contingent upon verification of clearance status. Export Control/ITAR: Certain roles may be subject to U.S. export control laws, requiring U.S. person status as defined by 8 U.S.C. 1324b(a)(3). Please review the job details below. This position requires an active U.S. Government Security Clearance at the TS/SCI level with required polygraph. We are seeking an Observability Engineer to support the design, implementation, and operation of monitoring, telemetry, and cost-optimization capabilities for mission-critical cloud systems. This role focuses on establishing system visibility, performance baselines, and actionable metrics across applications, infrastructure, and data pipelines, with an emphasis on resiliency, scalability, and cloud cost efficiency. The ideal candidate is hands-on, comfortable working directly in cloud environments, and experienced partnering with development teams to instrument applications, analyze performance, and drive data-backed recommendations for system and cost optimization.

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent experience).
5+ years of experience supporting cloud-based systems with a focus on observability, performance, or reliability.
Strong, hands-on experience implementing monitoring and observability solutions in AWS, including CloudWatch.
Experience building and maintaining dashboards, metrics, alerts, and KPIs for production systems.
Experience analyzing cloud costs and making practical, actionable cost optimization recommendations.
Hands-on experience with system, network, or application load and stress testing.
Experience collaborating directly with development teams to instrument applications and interpret telemetry data.
Willingness to work onsite full time.

Nice To Haves

Demonstrated experience supporting multi-cloud or hybrid cloud environments.
Hands-on experience implementing synthetic monitoring or automated testing solutions (e.g., Selenium, Puppeteer, or similar).
Demonstrated experience designing systems for high availability, resiliency, and horizontal scaling.
Hands-on experience leveraging spot instances, autoscaling groups, or other dynamic infrastructure strategies.
Experience supporting mission-critical or high-visibility production systems.
Demonstrated experience establishing performance baselines, SLIs, and SLOs.
Hands-on experience using Infrastructure-as-Code or automation tools to support observability or operational workflows.

Responsibilities

Design and implement observability and monitoring solutions for cloud-based systems, applications, and data pipelines.
Build and maintain CloudWatch dashboards, metrics, alarms, and KPIs to support operational visibility and decision-making.
Establish baseline performance metrics for systems, applications, and workflows to support trend analysis and optimization.
Analyze cloud usage and spending to identify cost optimization opportunities, including right-sizing, storage tiering, and architectural recommendations.
Support resiliency and scaling strategies, including evaluation and use of spot instances and dynamic scaling options.
Evaluate multi-cloud cost considerations, recommending architecture patterns that balance performance, resiliency, and cost.
Partner with development teams to design and implement application telemetry, including metrics, logging, and synthetic monitoring.
Track telemetry outputs to identify KPIs and provide actionable recommendations for performance, reliability, and user experience improvements.
Conduct network stress testing, application load testing, and system performance testing to identify bottlenecks and degradation risks.
Analyze test results to diagnose issues such as slow page loads, throughput constraints, and scaling limitations.
Document findings, dashboards, baselines, and recommendations to support ongoing operational maturity.

Benefits

Vantor offers a competitive total rewards package that goes beyond the standard, including a robust 401(k) with company match, mental health resources, and unique perks like student loan repayment assistance, adoption reimbursement and pet insurance to support all aspects of your life.
You can find more information on our benefits at: https://www.Vantor.com/careers

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume