Xgrid.co-posted about 2 months ago
Full-time • Mid Level
San Jose, CA
51-100 employees

We are seeking a System Design Architect to lead the architecture, design, and evolution of modern, scalable, and resilient workflow platforms for our clients. As a trusted technology partner for several enterprise clients, we help organizations design distributed systems that are fault-tolerant, event-driven, and cloud-native. You’ll play a key role in driving these architectural engagements — defining patterns, mentoring teams, and ensuring solutions are robust and maintainable at scale. While direct experience with specific workflow orchestration technologies is a strong advantage, we value system design expertise, distributed systems fundamentals, and rapid learning ability above all.

  • Lead end-to-end system design and architecture for workflow and automation platforms across client engagements.
  • Design and implement distributed, event-driven systems with high scalability, availability, and fault tolerance.
  • Define architecture blueprints, reference implementations, and best practices for workflow orchestration and stateful service design.
  • Collaborate with client engineering teams to evaluate, onboard, and scale workflow solutions.
  • Guide decisions around data consistency, reliability patterns (sagas, retries, compensation), and observability.
  • Conduct architecture reviews and provide technical governance across multiple concurrent projects.
  • Partner with internal solution teams to establish accelerators, templates, and frameworks for rapid client adoption.
  • Mentor engineers and provide architectural leadership within the organization.
  • 8+ years of experience in software architecture, backend design, or distributed systems.
  • Proven experience designing microservice-based or event-driven architectures.
  • Deep understanding of scalability, reliability, consistency models, and system resiliency.
  • Proficiency in one or more languages such as Go, Java, Python, or TypeScript.
  • Strong understanding of messaging, streaming, and asynchronous communication (e.g., Kafka, RabbitMQ, Pub/Sub).
  • Experience with cloud-native infrastructure (Kubernetes, Docker, CI/CD, Observability).
  • Solid background in API design, workflow modeling, or automation systems.
  • Experience with workflow orchestration platforms or stateful orchestration frameworks (e.g., Temporal, Cadence, Airflow, Step Functions).
  • Understanding of orchestration vs. choreography, activity/task design, and failure handling patterns.
  • Experience running or optimizing workflow or event-processing clusters in production environments.
  • Familiarity with Kubernetes operators, service meshes, and observability stacks (Grafana, Prometheus, OpenTelemetry).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service