Infrastructure Engineer

DTEX SystemsFremont, CA

About The Position

DTEX is seeking an experienced Site Reliability Engineer (SRE) with a strong software engineering background to help drive modernization of our infrastructure and operations. This is a high-impact role where you will design and implement automation solutions to manage customer environments and enable the business to scale beyond what manual operations allow. You will be instrumental in our efforts to transition from legacy operations to modern, automated Infrastructure as Code best practices, applying software engineering principles to solve complex operational problems and build resilient systems.

Requirements

  • 3+ years of hands-on experience managing production environments in AWS and/or GCP.
  • Strong proficiency in Python. Demonstrated ability to write clean, maintainable, and testable code to solve infrastructure problems.
  • Experience with Terraform, including best practices for state management and modular design in complex environments.
  • Strong knowledge of Linux internals and high competency in Bash scripting and command-line operations.
  • Proficiency with Ansible and/or Saltstack as configuration management tools.
  • Expert level understanding of Git and collaborative workflows, such as branching strategies and code review best practices.
  • Hands on experience using AI Tools to solve SRE operational challenges. Proven experience developing Agentic workflows to automate SRE tasks is a plus
  • A strong desire to solve complex problems, the resilience to work through significant technical debt, and enthusiasm for driving cultural and technical change.
  • A desire to work in enterprise and government focused computing environments with robust security and reliability requirements.
  • MS/BS in Computer Science/Computer Engineering or related field of study (or equivalent experience)
  • Must meet all personnel screening requirements as specified by applicable federal contracts or agency regulations, which may include US citizenship.
  • This position is open to U.S. based candidates only. Unfortunately, we are unable to provide work visa sponsorship at this time.

Nice To Haves

  • Proven track record of transitioning legacy/manual operations environments to automated, IaC-driven approaches.
  • Experience with containerization in the context of Docker or Kubernetes, and how container orchestration is used in modern systems.
  • Experience building and managing CI/CD pipelines for infrastructure automation.
  • Familiarity with Zabbix, Prometheus, Grafana and other tools.
  • Experience operating and querying Opensearch/Elasticsearch.

Responsibilities

  • Design, write, and maintain software, primarily in Python, to automate the provisioning, deployment, and configuration management of our infrastructure
  • Contribute to the adoption and maturation of Terraform, establishing and maintaining best practices for state management, modularization, and version control.
  • Utilize Ansible and/or Saltstack to ensure consistency, repeatability, and standardization across all environments.
  • Develop robust CI/CD pipelines for both infrastructure and application deployments, replacing manual processes.
  • Implement and mature monitoring, logging, and alerting systems to proactively improve system reliability.
  • Participate in a “follow the sun” on-call rotation, focusing on sustainable incident response, blameless postmortems, and driving continuous improvement.
  • Champion SRE principles, automation, and coding best practices within the team and across the organization.

Benefits

  • Competitive compensation
  • Equity participation
  • Health and wellness benefits
  • Generous time-off policies
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service