Performance & Reliability Engineer

ASM ResearchSan Antonio, TX
18d

About The Position

Plays a crucial role in maintaining and enhancing the reliability, availability, and performance of our applications and services. You will leverage your expertise in AWS operations, infrastructure as code, and deployment automation to streamline processes, reduce downtime, and improve overall system performance.

Requirements

  • Bachelor’s Degree in Information Technology, Computer Science or a related field or equivalent relevant experience.
  • 0-3 years of experience in information technology, systems administration or other IT related field.
  • Strong expertise in AWS cloud services, including EC2, S3, RDS, Lambda, etc.
  • Proficiency in infrastructure as code tools such as Terraform, CloudFormation, or similar.
  • Experience with deployment automation tools and frameworks (e.g., Jenkins, Ansible, Puppet, Chef).
  • Solid understanding of monitoring, alerting, and logging tools (e.g., Dynatrace, Splunk, Prometheus, Grafana, ELK Stack).
  • Strong scripting and automation skills using languages such as Python, Bash, or PowerShell.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration abilities.
  • Strong knowledge of Microsoft Operating Systems and products that include Microsoft Windows, Windows Servers, Microsoft Office365 and SharePoint, Microsoft Teams
  • Applies standard methodology, techniques, procedures and criteria.
  • Ability to analyze, troubleshoot and resolve basic/routine system hardware, software or networking related problems.
  • Ability to plan and coordinate the deployment of new technology and resolve technical problems individually and as a project participant.
  • Ability to communicate effectively, both orally and in writing and to translate technical terminology into terms understandable to non-technical employees.
  • Exceptional customer service skills.

Nice To Haves

  • Experience preferred with cloud infrastructure, digital workspace, and storage technology

Responsibilities

  • Ensure the reliability, availability, and performance of applications and services through proactive monitoring, incident response, and capacity planning.
  • Manage and optimize AWS cloud infrastructure to support scalable and resilient application operations.
  • Develop, implement, and maintain infrastructure as code using tools such as Terraform, CloudFormation, or similar.
  • Automate deployment processes to ensure consistent and reliable delivery of software updates and infrastructure changes.
  • Collaborate with development teams to design and implement solutions that enhance system performance and reliability.
  • Conduct root cause analysis for incidents and implement strategies to prevent recurrence.
  • Establish and maintain monitoring, alerting, and logging frameworks to ensure visibility into system health and performance.
  • Participate in on-call rotations to provide 24/7 support for critical systems and applications.
  • Drive continuous improvement initiatives to enhance operational efficiency and reduce technical debt.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service