Axiom Software Solutions Limited-posted 5 months ago
101-250 employees

We are seeking a Site Reliability Engineer with experience in Fidelity to join our team on a contract basis. This role is fully remote and requires a strong background in managing Kubernetes environments and building scalable infrastructure. The ideal candidate will be responsible for designing, implementing, and managing Kubernetes environments, as well as developing comprehensive monitoring solutions and implementing alerting strategies. You will also analyze system performance bottlenecks, conduct incident response, and collaborate with development teams to enhance application reliability.

  • Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting
  • Build and maintain scalable and reliable infrastructure using infrastructure as code principles
  • Develop comprehensive monitoring solutions and implement alerting strategies
  • Analyze system performance bottlenecks and implement improvements
  • Implement and maintain CI/CD pipelines for seamless deployments
  • Conduct incident response, root cause analysis, and implement preventative measures
  • Create and enhance automation tools leveraging AI/ML where applicable
  • Collaborate with development teams to improve application reliability and performance
  • 5-7 years of experience in SRE or DevOps roles
  • Strong expertise with Kubernetes ecosystem and container orchestration
  • Deep understanding of Linux/Unix operating systems and performance analysis tools (NMON, etc.)
  • Experience with log analysis, monitoring systems, and observability tools
  • Proficiency in database administration and performance tuning (Oracle, SQL Server)
  • Strong programming skills in at least one of: Python, Go, Java, or Node.js
  • Experience developing automation tools and frameworks
  • Proven track record of proactive problem identification and resolution
  • Experience with AI/ML integration into operational workflows
  • Cloud platform experience (AWS, GCP, Azure)
  • Knowledge of service mesh technologies
  • Experience with distributed systems architecture
  • Familiarity with security best practices and compliance requirements
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service