Staff Engineer

Workato•Palo Alto, CA

9d•Remote

About The Position

Workato delivers enterprise infrastructure for the agentic era, redefining iPaaS and helping enterprises unify data, applications, processes, and AI into a single, governed platform. A leader in Enterprise MCP and trusted by 50% of the Fortune 500, Workato’s cloud-native architecture connects every application, data source, and process to power real-time orchestration at scale. With enterprise-grade security and continuous innovation at its core, Workato provides the trusted foundation for organizations to automate with confidence and operationalize AI across the business.

Requirements

Bachelor’s degree (or foreign equivalent) in Computer Science, Management, or a closely related field
5 years of progressively responsible experience in the job offered or a related occupation
3 years of experience with Rust, including Tokio, asynchronous programming, concurrency, performance optimization, and allocator profiling
2 years of experience with Apache DataFusion and Apache Arrow, including Parquet, data pipelines, query planning, and vectorized execution
3 years of experience creating integration tests with real dependencies using Docker and Testcontainers
2 years of experience with behavior-driven testing for distributed services using frameworks such as Gherkin and Cucumber.
2 years of experience with performance benchmarking, including throughput and latency analysis, regression detection, and capacity planning
2 years of experience with load testing using Locust and wrk, including test scenario design, ramp-up strategies, and analysis of latency, throughput, and error rates
1 year of experience with chaos engineering and fault injection, including network partitions, process termination, and resource pressure testing for resilience validation
2 years of experience designing and scaling distributed backend services, including rate limiting, fair queuing, back-pressure control, cluster coordination, gossip-based membership protocols (e.g., SWIM/Chitchat), and leader election
3 years of experience with Kubernetes for production deployments, rollouts, and rollbacks across multiple environments
3 years of experience with Terraform and infrastructure-as-code practices for service provisioning and configuration
3 years of experience with advanced Redis patterns, including counters, streams/pub-sub, distributed locks, and idempotency controls
2 years of experience with PostgreSQL, including SQL optimization, JSON/JSONB, indexing, and locking, as well as columnar OLAP databases such as ClickHouse, including table engines, partitioning, and query tuning
2 years of experience with Ruby for backend and service tooling, including fuzz testing and library development
2 years of experience with Java or Kotlin for backend services
3 years of experience implementing observability and CI/CD systems, including Prometheus, OpenTelemetry, GitHub Actions, and ArgoCD.
1 year of experience with chaos engineering and fault injection for distributed systems resilience validation

Responsibilities

Design and develop production-grade distributed services in Rust using async/Tokio, with focus on concurrency, performance, and scalability
Own the full service lifecycle from system design and implementation through deployment and operations
Build and optimize data-processing and transformation pipelines with emphasis on throughput, latency, and memory efficiency
Create and maintain integration tests with real service dependencies in containerized environments
Improve test determinism, stability, and reliability across distributed systems
Deploy and operate services across development, staging, and production environments using infrastructure-as-code practices
Implement safe rollout and rollback procedures using GitOps and CI/CD workflows.
Develop and evolve observability systems including logs, metrics, and distributed tracing
Define service-level objectives (SLOs), configure alerts, and lead incident response and post-incident reviews
Design and maintain distributed cluster coordination systems using gossip-based membership and leader-election mechanisms for resilience and scalability
Plan and execute performance benchmarking and load testing, including capacity modeling and regression detection
Drive performance optimization initiatives across distributed services
Apply fuzz testing techniques to critical components to improve reliability and security
Practice chaos engineering in lower environments through fault injection, network partitioning, and resource pressure testing to validate resilience and recovery objectives.
Participate in architecture reviews and code reviews
Contribute to technical design documents and RFCs
Mentor peers and collaborate cross-functionally on service integrations and stateful components