AI Engineering Lead

DEUNASan Francisco, CA

About The Position

Athia is DEUNA's AI-powered payment intelligence platform, moving from early ML experimentation to the critical infrastructure behind billions of dollars in annual transaction volume. We are looking for a hands-on Engineering Lead who can own the full technical stack: from model development and data pipelines to production payment orchestration, cloud/on-prem deployments, and real-time observability. This is not a coordination role. You will build, ship, and own. You will be the technical authority that bridges AI/ML systems with our core payments stack, leading both the platform engineering and the modeling lifecycle end-to-end.

Requirements

  • Go (Golang) — production-grade services
  • Python — ML pipelines, model serving, tooling
  • RESTful APIs and gRPC
  • Distributed systems & event-driven arch
  • CI/CD, Docker, Kubernetes
  • Cloud platforms (AWS or GCP)
  • Hybrid / on-prem deployment patterns
  • PyTorch or TensorFlow — training & fine-tuning
  • scikit-learn, XGBoost, or tabular ML
  • MLflow, Weights & Biases, or equivalent
  • Feature engineering & feature stores
  • Model monitoring & drift detection
  • A/B testing and shadow deployment
  • Low-latency inference architectures
  • React and Next.js
  • TypeScript
  • Component design systems
  • API integration patterns
  • Prometheus, Grafana, or Datadog
  • Structured logging & distributed tracing
  • SQL and analytical query patterns
  • Data pipeline tooling (Airflow, dbt, etc.)
  • 6+ years in software engineering with strong backend foundations.
  • 2+ years in a Tech Lead or Staff Engineer role owning a production platform end-to-end.
  • Demonstrated experience shipping ML/AI systems to production — not just research or notebooks.
  • Background in payments, fintech, or high-transaction environments strongly preferred.
  • Experience with on-premise deployment or hybrid infrastructure for enterprise clients is a plus.
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

Responsibilities

  • Design, train, and fine-tune ML models for payment optimization use cases — including authorization rate improvement, dynamic routing, cost minimization, and fraud signal detection.
  • Select and apply the right frameworks (PyTorch, TensorFlow, scikit-learn) per model type and latency budget.
  • Own the model lifecycle: experimentation → offline evaluation → shadow deployment → A/B testing → production promotion.
  • Monitor and remediate model drift, data distribution shifts, and performance degradation proactively.
  • Define evaluation metrics that map directly to business KPIs (approval rate lift, GMV impact, provider cost).
  • Architect and build optimized data pipelines to collect, clean, and preprocess high-volume transaction data for model training and inference.
  • Design feature stores and real-time feature serving layers that keep inference latency within payments SLA requirements (<100 ms).
  • Establish data quality standards, schema validation, and lineage tracking across the ML data stack.
  • Partner with the Data Engineering team to ensure training data reflects the full distribution of providers, regions, and merchant types in our network.
  • Integrate ML model outputs into DEUNA's live payment routing and orchestration layer with zero tolerance for latency regressions or silent errors.
  • Develop and own the inference service layer in Go and Python, ensuring thread-safe, performant, and fault-tolerant operation under peak transaction load.
  • Lead the design of hybrid deployment architectures: cloud-native (AWS/GCP) and on-premise client environments, including secure bi-directional data synchronization.
  • Build and maintain RESTful and gRPC APIs that expose Athia capabilities to the broader DEUNA platform and external partners.
  • Own the full observability stack for Athia: real-time dashboards, alerting thresholds, anomaly detection, and post-incident reviews.
  • Implement model-specific monitoring (prediction distributions, confidence scores, provider error rates) alongside standard infrastructure metrics.
  • Create a fast feedback loop with the Operations team to detect and remediate routing degradation or GMV impact within SLA.
  • Define on-call runbooks and escalation paths that are clear, tested, and kept up to date.
  • Provide architectural guidance to scale Athia to handle 10M+ monthly transactions across concurrent global partner launches.
  • Lead and mentor engineers through architecture reviews, code reviews, technical planning, and day-to-day execution.
  • Drive engineering best practices: testing strategy (unit, integration, shadow), CI/CD pipelines, documentation standards, and security compliance.
  • Translate business and product goals into concrete technical roadmaps with realistic timelines and clear dependency mapping.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service