Vantor Services-posted 8 days ago
$180,000 - $220,000/Yr
Full-time • Mid Level
Onsite • McLean, VA

Vantor is forging the new frontier of spatial intelligence, helping decision makers and operators navigate what’s happening now and shape what’s coming next. Vantor is a place for problem solvers, changemakers, and go-getters—where people are working together to help our customers see the world differently, and in doing so, be seen differently. Come be part of a mission, not just a job, where you can: Shape your own future, build the next big thing, and change the world. To be eligible for this position, you must be a U.S. Person , defined as a U.S. citizen, permanent resident, Asylee, or Refugee. Note on Cleared Roles: If this position requires an active U.S. Government security clearance, applicants who do not currently hold the required clearance will not be eligible for consideration. Employment for cleared roles is contingent upon verification of clearance status. Export Control/ITAR: Certain roles may be subject to U.S. export control laws, requiring U.S. person status as defined by 8 U.S.C. 1324b(a)(3). Please review the job details below. This position requires an active U.S. Government Security Clearance at the TS/SCI level with required polygraph. We are looking for a full-time Observability Engineer (OE) to gain deeper insights to complex systems and cloud-native environments. This role is part of our data collection and software development team that ensures Vantor’s services have reliability and up-time standards appropriate to customer’s needs. The environment calls for a fast rate of improvement while keeping an ever-watchful eye on capacity, performance and cost. The OE will have the mindset and a set of engineering approaches to understand “the what” and “the why”. They will build monitoring solutions to gain visibility into operational problems, ensuring customer value and satisfaction is achieved. Their focus is to drive observability and monitoring for new and existing systems in order to provide systems insight and resolve application and infrastructure issues. The successful candidate has a breath of knowledge to discover, implement and collaborate with teammates on the implementation of solutions for complex problems across the entire technology stack.

  • Define standards for monitoring the reliability, availability, maintainability and performance of sponsor-owned and operated systems.
  • Design and architect operational solutions for managing applications and infrastructure.
  • Drive service acceptance by adopting new processes into operations and developing new monitoring for exposure of risks and automating against repeatable actions.
  • Partner with service and product owners to establish key performance indicators to identify trends and achieve better outcomes.
  • Provide deep troubleshooting for production issues.
  • Engage with service owners to maximize a team’s ability to identify and remediate root cause performance issues quickly ensuring rapid service interruption recovery.
  • Build and/or use tools to correlate disparate data sets in an efficient and automated way to help teams quickly identify the root-cause to issues and to understand how different problems relate to each other.
  • Coordinate with the sponsor to support major incidents, large-scale deployments and SecOps user support.
  • US citizenship required
  • Active/current TS/SCI with required polygraph
  • Bachelor's degree in computer science or related area of study
  • Minimum 5 years of experience
  • Working knowledge of K8s, Docker, Helm and automated deployment via pipeline (e.g. Concourse or Jenkins)
  • Familiarity with distributed control systems such as Git
  • Experience with AWS cloud services
  • Experience with setting up monitoring and observability solutions across sponsor owned systems, tools and data feeds
  • Proficient in scripting with Python and Java
  • Willingness to work onsite full time
  • Ability and willingness to share on-call responsibilities
  • Advanced knowledge of Unix/Linux systems, with high comfort level at the command line
  • Experience with other cloud services providers beyond AWS
  • Experience with CloudWatch or other monitoring tools inside of AWS
  • Familiarity with Prometheus/Grafana or other monitoring tools for ETL feeds, APIs, servers, C2S servies, networks and AI/ML capabilities
  • Good understanding of networking fundamentals
  • Organized with an ability to document and communicate ongoing work tasks and projects
  • Receptive to giving, receiving and implementing feedback in a highly collaborative environment
  • Understanding of Incident and Problem Management
  • Effectively prioritize work and encourage best practices in others
  • Meticulous and cautious with the ability to identify and consider all risks and balance those with performing the task efficiently
  • Experience with Root Cause Analysis (RCA)
  • Experience with ETL processes
  • Willingness to step in as a leader to address ongoing incidents and problems, while providing guidance to others in order to drive to a resolution
  • Vantor offers a competitive total rewards package that goes beyond the standard, including a robust 401(k) with company match, mental health resources, and unique perks like student loan repayment assistance, adoption reimbursement and pet insurance to support all aspects of your life.
  • You can find more information on our benefits at: https://www.Vantor.com/careers
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service