Observability Engineer

Vantor Services•McLean, VA

43d•$180,000 - $220,000•Onsite

About The Position

Vantor is forging the new frontier of spatial intelligence, helping decision makers and operators navigate what’s happening now and shape what’s coming next. Vantor is a place for problem solvers, changemakers, and go-getters—where people are working together to help our customers see the world differently, and in doing so, be seen differently. Come be part of a mission, not just a job, where you can: Shape your own future, build the next big thing, and change the world. To be eligible for this position, you must be a U.S. Person , defined as a U.S. citizen, permanent resident, Asylee, or Refugee. Note on Cleared Roles: If this position requires an active U.S. Government security clearance, applicants who do not currently hold the required clearance will not be eligible for consideration. Employment for cleared roles is contingent upon verification of clearance status. Export Control/ITAR: Certain roles may be subject to U.S. export control laws, requiring U.S. person status as defined by 8 U.S.C. 1324b(a)(3). Please review the job details below. This position requires an active U.S. Government Security Clearance at the TS/SCI level with required polygraph. We are looking for a full-time Observability Engineer (OE) to gain deeper insights to complex systems and cloud-native environments. This role is part of our data collection and software development team that ensures Vantor’s services have reliability and up-time standards appropriate to customer’s needs. The environment calls for a fast rate of improvement while keeping an ever-watchful eye on capacity, performance and cost. The OE will have the mindset and a set of engineering approaches to understand “the what” and “the why”. They will build monitoring solutions to gain visibility into operational problems, ensuring customer value and satisfaction is achieved. Their focus is to drive observability and monitoring for new and existing systems in order to provide systems insight and resolve application and infrastructure issues. The successful candidate has a breath of knowledge to discover, implement and collaborate with teammates on the implementation of solutions for complex problems across the entire technology stack.

Requirements

US citizenship required
Active/current TS/SCI with required polygraph
Bachelor's degree in computer science or related area of study
Minimum 5 years of experience
Working knowledge of K8s, Docker, Helm and automated deployment via pipeline (e.g. Concourse or Jenkins)
Familiarity with distributed control systems such as Git
Experience with AWS cloud services
Experience with setting up monitoring and observability solutions across sponsor owned systems, tools and data feeds
Proficient in scripting with Python and Java
Willingness to work onsite full time
Ability and willingness to share on-call responsibilities
Advanced knowledge of Unix/Linux systems, with high comfort level at the command line

Nice To Haves

Experience with other cloud services providers beyond AWS
Experience with CloudWatch or other monitoring tools inside of AWS
Familiarity with Prometheus/Grafana or other monitoring tools for ETL feeds, APIs, servers, C2S servies, networks and AI/ML capabilities
Good understanding of networking fundamentals
Organized with an ability to document and communicate ongoing work tasks and projects
Receptive to giving, receiving and implementing feedback in a highly collaborative environment
Understanding of Incident and Problem Management
Effectively prioritize work and encourage best practices in others
Meticulous and cautious with the ability to identify and consider all risks and balance those with performing the task efficiently
Experience with Root Cause Analysis (RCA)
Experience with ETL processes
Willingness to step in as a leader to address ongoing incidents and problems, while providing guidance to others in order to drive to a resolution

Responsibilities

Define standards for monitoring the reliability, availability, maintainability and performance of sponsor-owned and operated systems.
Design and architect operational solutions for managing applications and infrastructure.
Drive service acceptance by adopting new processes into operations and developing new monitoring for exposure of risks and automating against repeatable actions.
Partner with service and product owners to establish key performance indicators to identify trends and achieve better outcomes.
Provide deep troubleshooting for production issues.
Engage with service owners to maximize a team’s ability to identify and remediate root cause performance issues quickly ensuring rapid service interruption recovery.
Build and/or use tools to correlate disparate data sets in an efficient and automated way to help teams quickly identify the root-cause to issues and to understand how different problems relate to each other.
Coordinate with the sponsor to support major incidents, large-scale deployments and SecOps user support.

Benefits

Vantor offers a competitive total rewards package that goes beyond the standard, including a robust 401(k) with company match, mental health resources, and unique perks like student loan repayment assistance, adoption reimbursement and pet insurance to support all aspects of your life.
You can find more information on our benefits at: https://www.Vantor.com/careers

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume