Design, develop, test, and deploy highly available, large-scale distributed infrastructure platforms. Utilize and integrate non-proprietary tools and technologies including Java, Go, Python, Docker, Kubernetes, Cassandra, Splunk, Prometheus, and Grafana. Maintain software development lifecycle best practices by enforcing coding standards, documentation, peer code review, automated testing, build pipelines, and deployment processes. Measure key system indicators including latency, throughput, error rates, availability, capacity, and resource utilization, and make data-driven decisions based on defined Service Level Objectives (SLOs). Collaborate with infrastructure, security, and feature development teams to ensure secure, reliable, and performant service delivery, and train engineers on platform components and best practices. Implement monitoring and observability solutions by building dashboards, alerts, and pipelines to ensure system reliability and enable incident response. Optimize performance and reliability of traffic management systems by improving load balancing, service discovery, and routing strategies, and by applying algorithm optimization, concurrency, and parallelism techniques. Ensure infrastructure security and compliance by addressing vulnerabilities, applying organizational security standards, and implementing encryption, authentication, and security controls to protect data transfers.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Number of Employees
5,001-10,000 employees