Staff Site Reliability Engineer - Big data Platform and Cloud Engineering

Visa-posted 3 months ago

$124,700 - $180,650/Yr

Full-time • Mid Level

Hybrid • Austin, TX

5,001-10,000 employees

Credit Intermediation and Related Activities

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

Visa's Technology Organization is a community of problem solvers and innovators reshaping the future of commerce. We operate the world's most sophisticated processing networks capable of handling more than 65k secure transactions a second across 80M merchants, 15k Financial Institutions, and billions of everyday people. While working with us you'll get to work on complex distributed systems and solve massive scale problems centered on new payment flows, business and data solutions, cyber security, and B2C platforms. As a Staff Site Reliability Engineer in Product Reliability Engineering, you will be part of a team that maintains and support Open Source Hadoop, Big data, Kafka Platforms, Cloud ensuring their availability, performance, reliability, and improving operational efficiency. You will be responsible for driving innovation for our partners and clients, within Visa and globally.

Design, build and manage Hadoop and Kafka clusters on Cloud - AWS, GCP and Azure.
Manage and optimize Open-Source Apache Hadoop, Big Data and Kafka clusters for high performance, reliability, and scalability.
Develop tools and processes to monitor and analyze system performance and to identify potential issues.
Collaborate with other teams to design and implement Solutions to improve reliability and efficiency of the in-premise Hadoop, Big data and Cloud platforms.
Effective root cause analysis of major production incidents and the development of learning documentation.
Plan and perform capacity expansions and upgrades in a timely manner to avoid any scaling issues and bugs.
Tune alerting and set up observability to proactively identify issues and performance problems.
Create standard operating procedure documents and guidelines on effectively managing and utilizing the platforms.
Leverage DevOps tools, disciplines (Incident, problem, and change management), and standards in day-to-day operations.
Develop tools and automations through Ansible, Python, Java, or by using any other programming language.

5 or more years of relevant work experience with a Bachelors Degree or at least 2 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD.
Expertise in one or more general development languages (e.g., Java, Python) or full stack development.
Experience collaborating with Engineering, Application and Other functional teams.

6 or more years of work experience with a Bachelors Degree or 4 or more years of relevant experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or up to 3 years of relevant experience with a PhD.
Experience with managing and optimizing Hadoop, Big Data and Kafka clusters in production environment.
Demonstrated experience with AWS EMR, MSK and GCP Hadoop and Kafka cloud platforms.
Proficient in one of programming languages Python, Java or full stack development.
Familiarity with big data tools (Big Data, Spark, Kafka, etc.) and frameworks (HDFS, MapReduce, etc.).
Strong knowledge in system architecture and design patterns for high-performance computing.
Good understanding of data security and privacy concerns.
Excellent problem-solving and troubleshooting skills.
Strong communication and collaboration skills.
Knowledge on observability tools like Grafana, Opera and Splunk.
Understanding of Linux, networking, CPU, memory, and storage.

Medical
Dental
Vision
401 (k)
FSA/HSA
Life Insurance
Paid Time Off
Wellness Program

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Site Reliability Engineer Resume Examples

•

Site Reliability Engineer Cover Letter Examples

Staff Site Reliability Engineer - Big data Platform and Cloud Engineering

Job Search Resources

Tools

Career Hubs

Guides

Company