Operations-Focused Engineer

TTC GlobalHouston, TX
2dRemote

About The Position

The Testing Consultancy (TTC) is a global specialist software testing company with a focus on helping organizations transform the way they deliver quality software. We have broad capabilities across a wide range of testing areas that enable our clients to increase the speed and quality of software development while reducing risk and cost. We’re looking for an operations-focused engineer to join our team. This role owns the day-to-day reliability and operational excellence of a portfolio of business-critical third-party enterprise platforms and integrations, partnering closely with engineering and cross-functional infrastructure teams to keep systems healthy, scalable, and secure.

Requirements

  • 3+ years experience in production operations, SRE, systems engineering, or production support for enterprise services.
  • Strong Linux/systems troubleshooting skills (processes, logs, performance, networking basics).
  • Experience participating in or leading on-call and handling production incidents with clear communication.
  • Proficiency in scripting/automation (e.g., Python and/or shell) and comfort with change management / peer review workflows.
  • Strong written and verbal communication; able to write clear runbooks and incident summaries.

Nice To Haves

  • Experience operating third-party enterprise platforms (integration middleware, identity/auth systems, web/app tiers, databases, batch/scheduled jobs).
  • Familiarity with vulnerability remediation and patch management practices in production environments.
  • Demonstrated track record reducing operational toil and improving reliability metrics (MTTR, alert noise, incident recurrence).
  • Experience coordinating complex incidents across multiple teams and stakeholders.
  • Experience using Capirca for network provisioning, Chef for configuration management, and Infrastructure as Code and Containers for deployment.

Responsibilities

  • Serve in an on-call rotation and lead incident response for production issues: triage, mitigation, escalation, and restoration.
  • Drive operational excellence: improve alert quality, reduce toil, document runbooks, and create repeatable operational processes.
  • Perform root cause analysis for incidents and recurring issues; drive corrective and preventive actions to completion.
  • Execute and coordinate maintenance activities (upgrades, patching, configuration changes) with minimal risk and downtime.
  • Build and maintain monitoring, dashboards, and health checks to detect issues early and reduce mean time to recovery.
  • Automate routine operational workflows using scripts and small tools; improve reliability through safe incremental change.
  • Partner cross-functionally (security, networking, storage, compute, vendor/third-party partners) to resolve complex issues.
  • Maintain accurate system documentation, operational standards, and service ownership practices across supported platforms.

Benefits

  • Competitive Base Salary
  • Medical, Dental, Vision Benefits
  • 401K w/ company match
  • Paid Time Off
  • Paid Holidays
  • Work Life Balance
  • Relaxed Work Environment
  • Growth and Development Opportunities

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service