Infrastructure Operations Engineer

KAYAKConcord, MA
$100,000 - $105,000Hybrid

About The Position

In this role, you'll join KAYAK's Operations team in our Concord, MA office and play a key role in the day-to-day support of our development and production environments. You'll work closely with engineering, security, and platform teams to keep our infrastructure reliable, performant, and ready to scale. If you’re a team player who loves to learn, engage with new tech, and help build better- we’d love to hear from you!

Requirements

  • A Bachelor’s degree in Computer Science, Information Systems, or a related technical field (or equivalent practical experience)
  • 3+ years of hands-on experience in an infrastructure, systems, or platform operations role in a production environment
  • Strong working knowledge of Linux systems administration (RHEL, CentOS, Ubuntu, or similar) and proficiency in shell scripting (Bash); Python scripting experience is a strong plus.
  • Solid understanding of datacenter operations including physical/virtual server management, networking fundamentals (DNS, TCP/IP, load balancing), and storage systems.
  • Demonstrable experience with monitoring and observability tooling — such as LogicMonitor, Datadog, Prometheus, or equivalent — including alerting, dashboarding, and threshold tuning.
  • Hands-on experience with log aggregation and analysis tools such as Kibana / Elasticsearch (ELK stack).
  • Familiarity with ticketing and incident management workflows using Jira / Atlassian products or similar platforms (e.g., ServiceNow, PagerDuty).
  • Experience supporting or operating cloud infrastructure (AWS, GCP, or Azure) — including compute, storage, and networking services.
  • Working knowledge of containerization and orchestration technologies (Docker, Kubernetes) in a production context.
  • Exposure to infrastructure-as-code or configuration management tools (Terraform, Ansible, Chef, or similar).

Responsibilities

  • Receive, triage, and prioritize inbound tickets from developers and business teams, ensuring timely resolution and clear communication throughout the lifecycle of each request.
  • Serve as a primary point of contact for infrastructure-related incidents, driving root cause analysis (RCA) and implementing corrective actions to prevent recurrence.
  • Monitor, audit, and continuously improve the health and performance of our production hosting platform using tools such as LogicMonitor, Kibana, and Elasticsearch.
  • Proactively identify anomalies, performance bottlenecks, and capacity risks in our infrastructure, escalating and coordinating remediation with relevant engineering teams.
  • Maintain, test, and refine backup systems and data retention policies to ensure business continuity and compliance with internal standards.
  • Develop and document operational runbooks, standard operating procedures (SOPs), and post-incident reports to build institutional knowledge and improve team efficiency.
  • Collaborate cross-functionally with software engineering, security, and platform teams to support infrastructure changes, deployments, and release processes.
  • Contribute to longer-term infrastructure improvement projects — from automation initiatives to platform migrations — as a hands-on team resource.
  • Participate in on-call rotations to support 24/7 production environment availability, responding to critical alerts and escalations as needed.
  • Identify opportunities to automate repetitive operational tasks using scripting (Bash, Python, etc.) to reduce toil and improve team velocity.
  • Support capacity planning efforts by tracking resource utilization trends and making data-informed recommendations for scaling infrastructure.

Benefits

  • Work from (almost) anywhere for up to 20 days per year
  • Company-paid therapy sessions through SpringHealth
  • Company-paid subscription to HeadSpace
  • Company-wide week off a year - the whole team fully recharges (and returns without a pile-up of work!)
  • No meeting Fridays
  • Paid parental leave
  • Generous paid vacation + time off for your birthday
  • Paid volunteer time
  • Development Dollars
  • Leadership development
  • Access to thousands of on-demand e-learnings
  • Travel Discounts
  • Employee Resource Groups
  • Competitive retirement and health plans
  • Free lunch 2 days per week
  • Fun quarterly events such as boat trips, arcades, ski trips, Thursday happy hours, and more
  • health benefits
  • flexible spending account
  • retirement benefits
  • life insurance
  • paid time off (including PTO, paid sick leave, medical leave, bereavement leave, floating holidays and paid holidays)
  • parental leave benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service