About The Position

Every time someone taps, swipes, or clicks to pay- Visa infrastructure makes it happen in milliseconds, across 200+ countries. As a Software Development Engineer on the Product Reliability Engineering (PRE) team, you won’t just watch those systems run- you’ll be one of the engineers building, automating, and evolving them. PRE is not a traditional ops team. We are a software engineering organization that treats infrastructure as code, reliability as a product, and automation as a strategic advantage. You’ll write Python, build agentic AI tools, manage data platforms, and contribute to the distributed systems that process billions of real-time transactions. From day one, you are an engineer- and from day one, your work matters. If you are endlessly curious about how large-scale systems stay resilient, obsess over elegant automation, and want to launch your career at the intersection of AI, infrastructure, and global financial technology — this role was built for you.

Requirements

  • Bachelor's degree, OR 3+ years of relevant work experience
  • Solid foundations in data structures, algorithms, and systems design -you can reason about complexity, tradeoffs, and failure modes.
  • Proficiency in Python and comfort writing scripts or tools in at least one additional language (Go, Java, or Bash).
  • Foundational understanding of relational databases (RDBMS): SQL, data modeling, query optimization, and database connectivity troubleshooting.
  • Familiarity with Linux/Unix environments and meaningful command-line fluency.
  • Exposure to cloud platforms (AWS, GCP, or Azure) and a conceptual understanding of containerization (Docker, Kubernetes).
  • Understanding of CI/CD principles and how modern software delivery pipelines are structured and maintained.
  • Genuine curiosity about GenAI platforms and agentic systems (OpenAI, Anthropic Claude, LangChain, or similar)- hands-on exposure is a plus, intellectual interest is a must.

Nice To Haves

  • Bachelor’s degree in Computer Science, Software Engineering, or a related technical field (2023–2025 graduates preferred; December 2025 graduates welcome).
  • Hands-on experience with infrastructure-as-code tools: Terraform, Ansible, or Pulumi -even from coursework, a capstone, or an internship.
  • Experience with database CI/CD tooling, particularly Liquibase for schema change management across environments.
  • Experience with observability tooling: Prometheus, Grafana, Splunk, ELK, or Datadog.
  • Database administration exposure: backup/recovery procedures, performance tuning, index management, or replication monitoring.
  • Familiarity with Git workflows and modern DevOps toolchains (Jenkins, GitHub Actions, ArgoCD).
  • Academic or project experience with ML frameworks: scikit-learn, PyTorch, or LangChain / LangGraph.
  • Understanding of networking fundamentals: DNS, load balancing, service mesh, or TCP/IP.
  • A GitHub profile, personal project, hackathon entry, or open-source contribution that shows us how you think and build.

Responsibilities

  • Design and ship end-to-end automation for deployment pipelines, infrastructure provisioning, and release orchestration — code that runs millions of times so engineers never have to repeat themselves.
  • Write clean, production-grade Python (and Go or Bash where it counts) to eliminate toil, reduce manual intervention, and make systems self-managing.
  • Develop modular frameworks for release scheduling, validation, rollback, and reporting that integrate across the full software delivery lifecycle.
  • Support the build, deployment, and operations of relational database systems, contributing to schema design, architecture decisions, and solution engineering for critical payment data infrastructure.
  • Gain exposure to real-time event streaming architectures that support payment processing at scale
  • Perform database health operations including patching, upgrades, backups, and recovery to maintain the availability and integrity of tier-1 production databases.
  • Optimize query performance through index tuning, execution plan analysis, and replication monitoring — targeting metrics like query execution time, CPU usage, and replication latency.
  • Automate database tasks and configuration management using tools like Ansible and Liquibase, and contribute to CI/CD pipelines that govern schema changes through TEST and PROD environments safely.
  • Build predictive and reactive monitoring dashboards for database anomalies, surfacing health signals before they become incidents.
  • Build GenAI-powered engineering assistants that automate deployment orchestration, release governance, and environment lifecycle management.
  • Integrate LLMs into observability, incident response, and developer support workflows, transforming reactive operations into proactive, AI-driven intelligence.
  • Contribute to prompt engineering, model fine-tuning, and agentic automation initiatives that position PRE as one of the most AI-forward reliability organizations in financial technology.
  • Build dashboards, alerts, and metrics using Prometheus, Grafana, Splunk, or ELK that give engineers real-time clarity on complex, globally distributed systems.
  • Analyze system performance and availability data and turn insights into infrastructure improvements that prevent incidents before they occur.
  • Contribute to self-healing and auto-scaling capabilities that keep critical payment infrastructure resilient without human intervention.
  • Ensure infrastructure and data platforms meet security and compliance standards across cloud-native deployments supporting global financial services at scale.
  • Support zero-downtime deployment strategies and high-availability architectures that Visa’s partners and billions of cardholders depend on around the clock.
  • Participate in threat modeling, vulnerability remediation, and audit readiness activities as part of a team that treats security as a first-class engineering concern.
  • Embed within Agile squads, working alongside senior engineers, product managers, and global PRE peers across sprint planning, reviews, and release discussions.
  • Document runbooks, SOPs, and engineering guides that make the team smarter, faster, and more autonomous over time.
  • Participate in on-call rotations (with robust support structures and mentorship) to build the incident response instincts that distinguish great reliability engineers.

Benefits

  • Medical
  • Dental
  • Vision
  • 401 (k)
  • FSA/HSA
  • Life Insurance
  • Paid Time Off
  • Wellness Program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service