Engineering

Building GDPR-Compliant AI from Scratch: Our Technical Approach

Rusha

Founder, Garnet AI

February 28, 2026 10 min read

Why We Built Everything from Scratch

When we started Garnet AI, we had a choice: use existing AI APIs (OpenAI, Google, etc.) or build our own proprietary engine. We chose the harder path — and here's why.

The Problem with Third-Party AI APIs

Using third-party AI APIs for compliance document processing creates several issues:

Data sovereignty: Vendor compliance documents contain sensitive information. Sending them to US-hosted APIs means data leaves the EU jurisdiction.

GDPR compliance: Under GDPR, processing personal data outside the EU requires specific legal mechanisms (SCCs, adequacy decisions). For sensitive compliance data, this creates unnecessary risk.

Data retention: Most API providers retain input data for model improvement. For compliance documents, this is unacceptable.

Audit trail: Regulators want to know exactly how data was processed. With third-party APIs, you have limited visibility into the processing pipeline.

Our Architecture

Garnet's AI stack is built on three proprietary components:

1. OCR Engine

Our OCR engine is purpose-built for compliance documents. Unlike general-purpose OCR (Tesseract, Google Vision), it understands:

Multi-column layouts common in audit reports
Table structures in control matrices
Watermarks and redactions without misinterpreting them
Low-quality scans from documents that have been printed, signed, and re-scanned

Accuracy: 99.4% character accuracy on compliance document benchmarks.

2. Compliance AI Model

Our AI model is trained specifically on compliance document structures:

SOC 2 Type I and Type II reports
ISO 27001 certificates and statements of applicability
Penetration test reports (various frameworks)
Data Processing Agreements
Bridge letters and management assertions

The model understands context, not just keywords. It knows that "qualified opinion" in Section 4 of a SOC 2 report has different implications than "qualified" in a job description.

3. EU-Sovereign Infrastructure

All processing happens on EU-hosted infrastructure:

Zero data retention: Documents are processed in-memory and purged immediately
No external API calls: Everything runs on our own infrastructure
Full audit trail: Every processing step is logged for regulatory reporting
Encryption in transit and at rest: AES-256 encryption throughout

The Trade-offs

Building proprietary means:

Slower initial development: We spent months building what an API call could do in days
Higher infrastructure costs: Running our own GPU clusters isn't cheap
Smaller model: Our model is smaller than GPT-4, but it's better at compliance documents

The trade-off is worth it. Our customers' data never leaves the EU, we have full control over the processing pipeline, and we can provide complete audit trails to regulators.

What's Next

We're continuously improving our models with structured feedback from alpha users. Every false positive and missed exception makes the system more accurate.

The goal: 99%+ exception detection rate with zero data leaving the EU.

DORA Is Here: What It Means for Your Vendor Risk Program

SOC 2 Reports: What AI Catches That Humans Miss