Tista Science And Technology Corporation-posted 4 months ago
Full-time • Senior
Remote • Austin, TX
Professional, Scientific, and Technical Services

Are you a Senior Systems Administrator who would like to have a positive impact for millions of people? If so, we may have an opportunity for you! TISTA associates enjoy above Industry Healthcare Benefits, Remote Working Options, Paid Time Off, Training/Certification opportunities, Healthcare Savings Account & Flexible Savings Account, Paid Life Insurance, Short-term & Long-term Disability, 401K Match, Tuition Reimbursement, Employee Assistance Program, Paid Holidays, Military Leave, and much more!

  • Proactively monitor system health, availability, and performance using observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).
  • Respond to alerts and incidents, triage issues, and perform root cause analysis (RCA).
  • Lead on-call rotations to ensure 24/7 uptime and quick recovery from outages.
  • Document incident reports and contribute to postmortems to prevent recurrence.
  • Automate manual operational tasks such as deployments, scaling, and configuration using tools like Ansible, Terraform, or Puppet.
  • Manage infrastructure as code (IaC) to ensure consistency across environments.
  • Optimize CI/CD pipelines for reliable and repeatable software delivery.
  • Build self-healing systems to minimize downtime.
  • Conduct load and stress testing to validate system performance under peak demand.
  • Establish and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
  • Identify and reduce sources of latency, bottlenecks, and single points of failure.
  • Work with development teams to design reliability, scalability, and fault tolerance into customer servers.
  • Patch operating systems, containers, and dependencies to address vulnerabilities.
  • Ensure compliance with organizational and regulatory requirements.
  • Implement access controls, secrets management, and least privileged principles.
  • Monitor resource utilization (CPU, memory, storage, network) to anticipate scaling needs.
  • Plan for growth by forecasting demand and preparing infrastructure accordingly.
  • Optimize cloud costs by rightsizing instances, using autoscaling, and leveraging reserved/spot instances.
  • Partner with software engineers to embed reliability practices into development.
  • Mentor teams on best practices for observability, automation, and incident handling.
  • Participate in blameless postmortems and contribute to knowledge-sharing sessions.
  • Continuously evaluate new tools and technologies to improve system reliability.
  • Design, monitor, and maintain Customer Servers to meet VA's 99.9%+ uptime and SLA requirements across multi-cloud and hybrid environments.
  • Implement fault-tolerant and self-healing architectures leveraging automation.
  • Develop and manage observability frameworks (logging, metrics, tracing) to detect, respond to, and remediate incidents quickly.
  • Lead blameless postmortems and drive corrective actions to strengthen VAEC resilience.
  • Engineer scalable automation pipelines for provisioning, patching, and compliance (e.g., Ansible, Terraform, Puppet, GitHub Actions).
  • Reduce manual effort through self-service tools for operations teams.
  • Monitor and optimize application and infrastructure performance to meet demand from VA Medical Centers, Enterprise Data Warehouses, and end users.
  • Ensure latency, throughput, and resource utilization align with mission needs.
  • Integrate VA 6500, NIST 800-53, FedRAMP, and Zero Trust requirements into daily operations.
  • Partner with cybersecurity teams to enforce continuous ATO (cATO) practices and vulnerability remediation.
  • Collaborate with Release Management, Engineering, and Operations teams to improve change management, deployment pipelines, and reliability practices.
  • Drive the adoption of SRE principles (error budgets, SLIs, SLOs, SLAs) into VA's IT Service Management (ITSM) processes.
  • Operate across VA's Enterprise Cloud (VAEC), on-premises data centers, and hybrid platforms, ensuring seamless integration and interoperability.
  • Support workloads across AWS GovCloud, Microsoft Azure Government, and Oracle Cloud Infrastructure (OCI) where applicable.
  • 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
  • Strong experience with Linux/Unix systems administration and troubleshooting.
  • Proficient with cloud platforms (AWS and/or Azure), especially in deploying Production workloads.
  • Deep understanding of monitoring, metrics, alerting, and observability.
  • Proficient in designing, implementing, and managing automation solutions using Ansible.
  • Experience with CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI, Azure DevOps).
  • Hands-on with containers and orchestration (Docker, Kubernetes, EKS, AKS).
  • Familiarity with networking concepts (TCP/IP, DNS, TLS, VPCs, load balancing).
  • Solid understanding of software development lifecycle (SDLC) and Agile methodologies.
  • Comfortable participating in on-call rotations and handling high-priority incidents.
  • AWS Certified SysOps Administrator or DevOps Engineer.
  • Linux Certified: Azure Administrator or DevOps Engineer Expert.
  • Certified Kubernetes Administrator (CKA).
  • Experience in chaos engineering, capacity modeling, or SRE tooling.
  • Excellent analytical and problem-solving skills.
  • Ability to work in cross-functional teams and communicate effectively with developers, operations, and leadership.
  • A strong bias for automation and self-healing systems.
  • Ownership mindset with a commitment to reliability and continuous improvement.
  • Above Industry Healthcare Benefits
  • Remote Working Options
  • Paid Time Off
  • Training/Certification opportunities
  • Healthcare Savings Account & Flexible Savings Account
  • Paid Life Insurance
  • Short-term & Long-term Disability
  • 401K Match
  • Tuition Reimbursement
  • Employee Assistance Program
  • Paid Holidays
  • Military Leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service