Senior Software Engineer CTJ Poly

MicrosoftReston, VA
3d

About The Position

The scale of our operations is enormous. We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all in the service of production reliability. Our Senior Software Engineer Applies debugging tools and examines logs, telemetry, and other methods to verify assumptions through writing and developing code proactively before issues occur and reactively as issues occur for products. Conducts retrospective debugging of solutions to identify root causes of problems. Maintains operations of live service as issues arise on a rotational, on-call basis. Implements solutions and mitigations to more complex issues impacting performance or functionality of Live Site service and escalates as necessary. Reviews and writes issues postmortem and shares insights with the team. Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions. Alerts stakeholders as to status and initiates actions to restore system/product/service for simple problems and complex problems when appropriate. Responds within Service Level Agreement (SLA) timeframe. Drives efforts to reduce incident volume, looking globally at incidences and providing broad resolutions. Escalates issues to appropriate owners. Drives efforts to integrate instrumentation for gathering telemetry data on system behavior such as performance, reliability, availability, usage, and safety mechanisms. Drives sustaining feedback loops from telemetry resulting in subsequent designs. Creates outputs of telemetry such as notifications or dashboards. Drives efforts to collect, classify, and analyze data on a range of metrics (e.g., health of the system, where bugs might be occurring). Drives the refinement of products through data analytics and makes informed decisions in engineering products through data integration. Builds, enhances, reuses, contributes to, and identifies new software developer tools to support other programs and applications to create, debug, and maintain code for products. Uses open source when possible. Begins to develop skills in other tools outside areas of expertise. Identifies internal tools and creates tools that will be useful for creating the product, determining if methods are still applicable for the current solution. Shares best practices and teaches others about new tools and strategies. Defines and develops standardized, repeatable, scalable solutions to guarantee quality.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • These requirements include, but are not limited to the following specialized security screenings: The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.
  • Clearance Verification:: This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.
  • 5+ years of experience with PowerShell, C#, or C++.
  • Experience working on large-scale distributed services with on-call responsibilities.
  • Ability to build and influence broadly towards common goals and priorities.
  • Ownership for end-to-end project lifecycle with solid project management and communication skills.

Responsibilities

  • Applies debugging tools and examines logs, telemetry, and other methods to verify assumptions through writing and developing code proactively before issues occur and reactively as issues occur for products.
  • Conducts retrospective debugging of solutions to identify root causes of problems.
  • Maintains operations of live service as issues arise on a rotational, on-call basis.
  • Implements solutions and mitigations to more complex issues impacting performance or functionality of Live Site service and escalates as necessary.
  • Reviews and writes issues postmortem and shares insights with the team.
  • Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions.
  • Alerts stakeholders as to status and initiates actions to restore system/product/service for simple problems and complex problems when appropriate.
  • Responds within Service Level Agreement (SLA) timeframe.
  • Drives efforts to reduce incident volume, looking globally at incidences and providing broad resolutions.
  • Escalates issues to appropriate owners.
  • Drives efforts to integrate instrumentation for gathering telemetry data on system behavior such as performance, reliability, availability, usage, and safety mechanisms.
  • Drives sustaining feedback loops from telemetry resulting in subsequent designs.
  • Creates outputs of telemetry such as notifications or dashboards.
  • Drives efforts to collect, classify, and analyze data on a range of metrics (e.g., health of the system, where bugs might be occurring).
  • Drives the refinement of products through data analytics and makes informed decisions in engineering products through data integration.
  • Builds, enhances, reuses, contributes to, and identifies new software developer tools to support other programs and applications to create, debug, and maintain code for products.
  • Uses open source when possible.
  • Begins to develop skills in other tools outside areas of expertise.
  • Identifies internal tools and creates tools that will be useful for creating the product, determining if methods are still applicable for the current solution.
  • Shares best practices and teaches others about new tools and strategies.
  • Defines and develops standardized, repeatable, scalable solutions to guarantee quality.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service