Network and Infrastructure Expert

TOPPAN Security•Addis, LA

About The Position

Design, build, deploy, and maintain an organization's core IT systems (servers, networks, storage, cloud services) to ensure reliable, efficient, and secure application delivery, focusing on tasks like automation, performance monitoring, security, Disaster Recovery and collaborating with development teams to support business goals. Translate architectural plans into operational systems, manage infrastructure changes, provide technical support.

Requirements

Degree in Computer Science, Computer Engineering, Software Engineering and related field from a recognized institution.
Minimum of 3+ years of experience in handling enterprise grade infrastructure and Network Engineering for production environment.
Technical Skills: Cloud platforms (AWS, Huawei Cloud Stack), virtualization (VMware, Xen Server), Networking (L2 – L7 stack), scripting (Python, Bash Shell), Linux containerization (Docker, Kubernetes), Observability tooling (Prometheus, Grafana, Loki, ELK), K8S cluster, Ansible, Bash shell, Python, GitOps.
Soft Skills: Problem-solving, communication, collaboration, strategic thinking, rigor

Responsibilities

Design & Implementation: Create and deploy scalable infrastructure (on-prem, cloud, hybrid) by setting up the Baremetal servers, virtual machines, configuring switches, Firewalls, IP SANs, K8S clusters and monitoring tools
Maintenance & Support: Manage servers, networks, storage, and Linux and Windows OS, troubleshoot issues, and prepare RCA
Monitoring & Observability: Build and maintain monitoring systems (metrics, logs, traces) to gain deep insights into system health (CPU, RAM, Disk usages, I/O stats, Network latency, and availability)
Automation: Implementing Infrastructure as a Code (IaC) with tools like Ansible, automate provisioning, and deploy necessary scripts for routine maintenance task
Security: Implement security controls, manage firewalls, ensure compliance, segment the network with proper IP planning, VLANs and ACLs
Performance & Reliability: Monitor system health/resource utilization, optimize performance metrics, manage backup/disaster recovery strategies as well as test recovery plans, proactively identifying and resolving site reliability issues. Solid understanding of High Availability and Fault-tolerance
Collaboration: Work with developers, and architects to meet project needs.
Documentation: Create and maintain technical documentation and architecture diagrams.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume