About The Position

Pindrop is redefining trust in the digital age. Our patented voice, and video authentication, fraud detection, and deepfake detection technologies protect some of the world’s largest banks, insurers, retailers, and healthcare leaders. As AI-driven threats evolve in the form of synthetic voices, deepfakes, face swapping and more, our solutions stay ahead, helping ensure that the real human and the right human are recognized. Pindrop is trusted by Fortune 500 enterprises to secure voice interactions, and with $100M ARR we’re entering our next phase of innovation and growth, backed by world-class investors including Andreessen Horowitz, IVP, and CapitalG. What you’ll do Design Design model training and inference workflows with clear versioning, lineage, and promotion criteria where models are part of the system. Define service responsibilities, interfaces, and data contracts that evolve safely. Specify behavior under retries, timeouts, partial failures, and dependency degradation. Choose consistency and durability guarantees that match risk, latency targets, and operational realities. Design the request path for predictable tail latency and controlled resource usage. Build Build and operate high-performance services and APIs that keep authentication reliable, secure, and fast at scale. Implement distributed services that are safe under concurrency and robust to duplicate and out-of-order events. Build real-time scoring and decision services with clear input/output contracts and bounded execution time. Build distributed training pipelines that scale, are reproducible, and produce auditable artifacts. Build pipelines that move data and model artifacts through validation, promotion, and release. Validate Define automated quality gates for service changes and releases. Add checks for data quality, schema/contract adherence, and training-serving consistency where appropriate. Define acceptance criteria tied to measurable outcomes and production behavior. Release Ship changes with staged rollouts and rollback readiness as defaults. Coordinate multi-service releases with clear cutover and recovery plans. Use production signals to validate rollouts and trigger rollback when risk is high. Operate Instrument the full path with metrics, logs, and traces that enable fast detection and diagnosis. Implement alerting that reflects user impact, not just component health. Lead incident response for your services, restore service quickly, and communicate clearly during events. Run post-incident reviews and close follow-ups that measurably reduce recurrence. Improve and lead Drive reliability work through SLIs, SLOs, and error budgets, and make tradeoffs explicit. Improve performance and cost through profiling, load testing, and capacity planning. Raise engineering quality through reviews, standards, and simplification of operationally expensive designs. Align across teams on interfaces, data contracts, and reliability expectations to reduce coordination friction. Evaluate new approaches when they materially improve security, performance, delivery safety, or operational simplicity. Who you are You have built and operated distributed production systems at scale. You write systems that handle concurrency, duplication, and out-of-order events without surprising behavior. You design for explicit failure modes, safe retries, and stable contracts. You can design for predictable tail latency and controlled resource usage in real-time request paths. You ship safely with staged rollouts, rollback readiness, and change discipline. You use observability to reason about production and instrument systems for fast detection and diagnosis. You can lead incident response and post-incident reviews and drive long-term reliability improvements. You have managed model lifecycle in production: versioning, validation, staged rollout, rollback, and outcome-tied monitoring. You have built or operated distributed training pipelines with reproducibility, lineage, and controlled promotion. You understand drift and training-serving skew risks and mitigate them with contracts, tests, and monitoring. You have built or operated real-time inference services in production. You communicate clearly, document decisions, and drive alignment on tradeoffs and success criteria. You can take ambiguous problems, define scope, and deliver steady progress while keeping the quality bar high. You actively seek out and remove unnecessary complexity, understanding that simplicity is a prerequisite for reliability, security, and velocity at scale. Familiarity with voice authentication, fraud detection, or deepfake detection is a plus, not a requirement.

Requirements

  • You have built and operated distributed production systems at scale.
  • You write systems that handle concurrency, duplication, and out-of-order events without surprising behavior.
  • You design for explicit failure modes, safe retries, and stable contracts.
  • You can design for predictable tail latency and controlled resource usage in real-time request paths.
  • You ship safely with staged rollouts, rollback readiness, and change discipline.
  • You use observability to reason about production and instrument systems for fast detection and diagnosis.
  • You can lead incident response and post-incident reviews and drive long-term reliability improvements.
  • You have managed model lifecycle in production: versioning, validation, staged rollout, rollback, and outcome-tied monitoring.
  • You have built or operated distributed training pipelines with reproducibility, lineage, and controlled promotion.
  • You understand drift and training-serving skew risks and mitigate them with contracts, tests, and monitoring.
  • You have built or operated real-time inference services in production.
  • You communicate clearly, document decisions, and drive alignment on tradeoffs and success criteria.
  • You can take ambiguous problems, define scope, and deliver steady progress while keeping the quality bar high.
  • You actively seek out and remove unnecessary complexity, understanding that simplicity is a prerequisite for reliability, security, and velocity at scale.
  • 5–7 years of software development experience.
  • Experience designing and implementing highly scalable cloud-based APIs.
  • Experience with multiple programming languages, such as Python and Go.
  • Expertise in data structures, algorithms, and concurrency.
  • Experience building and operating real-time distributed systems, including patterns for resilient services such as backpressure, idempotency, timeouts, and retry or circuit-breaking strategies.
  • 2+ years of experience in DevOps practices towards deployment of SaaS services, including hands-on experience with Jenkins and GitHub Actions; implementing and maintaining CI/CD pipelines; and managing and maintaining applications in a multi-container environment such as Kubernetes.
  • Knowledge of different data storage technologies, such as Redis and MySQL.
  • Knowledge of Docker and container orchestration frameworks such as Kubernetes.
  • Experience developing and maintaining services using AWS native products such as Kinesis, DynamoDB, and S3.
  • Experience with observability and monitoring tools such as Prometheus, Grafana, and cloud logging and tracing.
  • Linux proficiency.

Nice To Haves

  • Familiarity with voice authentication, fraud detection, or deepfake detection is a plus, not a requirement.
  • Experience working with production ML systems and MLOps (for example, model deployment, feature pipelines, experiment tracking, and model or data quality monitoring) is a strong plus, but not required.

Responsibilities

  • Design model training and inference workflows with clear versioning, lineage, and promotion criteria where models are part of the system.
  • Define service responsibilities, interfaces, and data contracts that evolve safely.
  • Specify behavior under retries, timeouts, partial failures, and dependency degradation.
  • Choose consistency and durability guarantees that match risk, latency targets, and operational realities.
  • Design the request path for predictable tail latency and controlled resource usage.
  • Build and operate high-performance services and APIs that keep authentication reliable, secure, and fast at scale.
  • Implement distributed services that are safe under concurrency and robust to duplicate and out-of-order events.
  • Build real-time scoring and decision services with clear input/output contracts and bounded execution time.
  • Build distributed training pipelines that scale, are reproducible, and produce auditable artifacts.
  • Build pipelines that move data and model artifacts through validation, promotion, and release.
  • Define automated quality gates for service changes and releases.
  • Add checks for data quality, schema/contract adherence, and training-serving consistency where appropriate.
  • Define acceptance criteria tied to measurable outcomes and production behavior.
  • Ship changes with staged rollouts and rollback readiness as defaults.
  • Coordinate multi-service releases with clear cutover and recovery plans.
  • Use production signals to validate rollouts and trigger rollback when risk is high.
  • Instrument the full path with metrics, logs, and traces that enable fast detection and diagnosis.
  • Implement alerting that reflects user impact, not just component health.
  • Lead incident response for your services, restore service quickly, and communicate clearly during events.
  • Run post-incident reviews and close follow-ups that measurably reduce recurrence.
  • Drive reliability work through SLIs, SLOs, and error budgets, and make tradeoffs explicit.
  • Improve performance and cost through profiling, load testing, and capacity planning.
  • Raise engineering quality through reviews, standards, and simplification of operationally expensive designs.
  • Align across teams on interfaces, data contracts, and reliability expectations to reduce coordination friction.
  • Evaluate new approaches when they materially improve security, performance, delivery safety, or operational simplicity.

Benefits

  • Competitive compensation, including equity for all employees
  • Unlimited Paid Time Off (PTO)
  • Generous health and welfare plans to choose from - including one employer-paid “employee-only” plan!
  • Best-in-class Health Savings Account (HSA) employer contribution
  • Affordable vision and dental plans for you and your family
  • Employer-provided life and disability coverage with additional supplemental options
  • Paid Parental Leave - Equal for all parents, including birth, adoptive & foster parents
  • One year of diaper delivery for your newest addition to the family! It’s our way of welcoming new Pindroplets to the family!
  • Identity protection through Norton LifeLock
  • Recurring monthly Phone and Internet allowance
  • One-time home office allowance
  • Remote first environment – meaning you have flexibility in your day!
  • Company holidays
  • Annual professional development and learning benefit
  • Pick your own Apple MacBook Pro
  • Retirement plan with competitive 401(k) match
  • Wellness Program including Employee Assistance Program, 24/7 Telemedicine
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service