Site Reliability Engineer

Cogent People IncColumbia, MD
Hybrid

About The Position

Cogent People Inc. is seeking a Site Reliability Engineer to support system reliability, monitoring, and operational stability across environments. This role is responsible for implementing observability and automation practices, supporting production systems, and ensuring system performance and availability. The position plays a key role in incident response, root cause analysis, and ongoing system optimization in collaboration with DevOps and development teams. The ideal candidate will bring experience in system monitoring, DevOps practices, and production support, along with the ability to collaborate across cross-functional engineering teams in a fast-paced environment. This position may be contingent upon contract award.

Requirements

  • Bachelor’s degree in Computer Science, Information Systems, or a related field, or an equivalent combination of education and experience
  • Experience in system reliability, DevOps, or production support roles
  • Experience with monitoring, logging, and observability tools
  • Understanding of incident management and root cause analysis processes
  • Familiarity with cloud environments and infrastructure concepts
  • Experience supporting automated deployment or operational workflows
  • Strong problem-solving and troubleshooting skills
  • Excellent written and verbal communication skills
  • Ability to work effectively in fast-paced, production-critical environments
  • Strong collaboration skills across development and operations teams
  • Must be a U.S. Citizen, Permanent Resident, or valid EAD holder
  • Must have lived in the United States for at least 3 of the past 5 years
  • Must be currently authorized to work in the U.S. without sponsorship

Nice To Haves

  • Experience with AWS or other cloud platforms
  • Familiarity with infrastructure-as-code tools (e.g., Terraform or similar)
  • Experience with tools such as Splunk, Datadog, Prometheus, or similar observability platforms
  • Experience with CI/CD pipelines and DevOps automation tools
  • Prior experience supporting enterprise-scale or regulated environments
  • Knowledge of application performance tuning and distributed systems behavior

Responsibilities

  • Support system reliability, monitoring, and operational stability across environments
  • Implement and maintain observability practices, including monitoring, logging, and alerting
  • Contribute to automation efforts that improve system reliability and operational efficiency
  • Participate in incident response activities and production support
  • Perform root cause analysis for system issues and outages
  • Support performance optimization and tuning of applications and infrastructure
  • Work with DevOps and development teams to maintain production readiness
  • Contribute to continuous improvement of deployment and operational processes
  • Collaborate across engineering teams to support stable and scalable systems

Benefits

  • Medical, Dental, and Vision Insurance (comprehensive coverage)
  • 401(k) with company match
  • Company-paid life insurance
  • Short-term and long-term disability coverage
  • Paid Time Off: 3 weeks annually + 10 paid holidays
  • Employee assistance and wellness resources (as applicable)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service