Staff / Principal Data Engineer

AppGate Cybersecurity, Inc.•New York, NY

13d•$180,000 - $270,000•Onsite

About The Position

We are building an AI-native data platform that powers fraud detection and response across 360 Fraud Protection. We are hiring a Staff or Principal Data Engineer to own the data platform and data lake at the heart of that work. You work hands-on and own the domain end-to-end, alongside a small group of senior engineers, data scientists and product partners. Owns the unified data platform and data lake that powers detection and response across 360 Fraud Protection. Every detection model and downstream AI capability depends on this data foundation, which makes it one of the highest-leverage engineering roles on the team. Stronger, broader and more reliable fraud signal directly improves detection accuracy, reduces customer losses and protects brand trust.

Requirements

Extensive experience building and operating large-scale data platforms and data lakes, with comfort working at high data volumes.
Deep, hands-on expertise with Apache Spark, Apache Flink and modern big-data systems.
Proven command of best practices for building and maintaining data pipelines in both batch and streaming modes.
Strong production engineering skills across the full delivery lifecycle, including Kubernetes and CI/CD tooling, with the ability to ship end-to-end.
A track record of owning data infrastructure end-to-end with limited supervision.

Nice To Haves

Experience with generative AI and embedding models, including embedding pipelines, vector databases and retrieval.
A cybersecurity or threat intelligence background, with hands-on exposure to threat types such as phishing, mobile threats and malware.
Familiarity with transaction data and transaction fraud signals.

Responsibilities

Own the design, build and operation of the data lake and ingestion platform end-to-end, from architecture through production reliability.
Build low-latency batch and streaming pipelines that ingest signals from internal and external sources, normalize them to a common schema, enrich them with context and serve model-ready data to the layers above.
Make adding a new data source a routine task rather than a project, so our view of risk keeps widening over time.
Establish data quality, freshness, completeness, lineage and observability so the platform is trustworthy enough to automate on top of.
Build data pipelines that ground generative AI, including unstructured text and threat intelligence processing, embedding generation, vector storage and retrieval.
Own deployment, CI/CD and operational reliability of the platform on Kubernetes.
Partner with data science, product and architecture to turn the platform into a shared foundation across 360 Fraud Protection.