Technical
Deep dives into PII detection, NER, and anonymization technology
32 articles
Cross-Platform PII: Mac, Linux, and Windows
Privacy officers on Mac, legal on Windows, data engineers on Linux — all processing the same data with different tools. Here's why OS-agnostic detection.
Cross-Application PII: Word, Chrome, and AI
Customer data flows from browser research to Word drafts to Claude prompts. Each context switch is a potential leakage point.
GDPR in App Logs: JSON PII Compliance
Application logs contain customer email addresses, IPs, and account numbers that GDPR Article 5(1)(e) requires be managed.
GDPR Log Anonymization: Keep Debugging
Application logs silently accumulate user emails, IPs, and account numbers. Here's how to share logs with third parties, contractors, and observability.
Document Format Fragmentation in PII Tools
A single DSAR response may span Word contracts, PDF invoices, Excel customer lists, and CSV exports. Using different tools for each format creates.
Why Binary PII Detection Fails Compliance
Detected/not-detected is insufficient for compliance contexts that require human judgment. Here's why confidence scoring transforms PII anonymization from.
Presidio: 3-Week Setup vs Managed PII
Microsoft Presidio has thousands of GitHub stars and hundreds of open issues. Setup complexity, PySpark integration overhead, and Python dependency.
6 Weeks to 3 Days: Managed PII Setup
Healthcare SaaS teams spend 6 weeks on self-hosted Presidio production deployment before switching to managed API. The managed API replaces the deployment.
Free PII Detection Costs €13K/Year
Self-hosting Presidio requires 40-80 hours initial setup and 5-10 hours/month ongoing maintenance. At €100/hour engineering rates, that's €13,200+.
Presidio 22.7% Precision Problem
A 2024 benchmark found Presidio's person name recognizer achieves 22.7% precision in business documents — meaning 77.3% of detections are false positives.
Reproducible Privacy: ML Presets
ML training data anonymization must be consistent and reproducible. If data scientists A and B apply different entity types, training datasets are.
GDPR Pipeline: Anonymize Before Storage
dbt column tags are not GDPR compliance. Raw customer data hits your Snowflake warehouse unmasked before tag-based policies apply.
FOIA: Redaction from Weeks to Hours
The federal government spent an estimated $500M on FOIA processing in 2024, mostly manual redaction. ARPA-H explicitly sought AI redaction software to.
GDPR ML Training Data Anonymization
GDPR restricts using personal data for ML training beyond its original collection purpose. Data scientists relying on ad-hoc Python scripts create.
FOIA: 80% Faster with Batch Redaction
US federal agencies received 1.5 million FOIA requests in FY2024 at an average cost of $482 per request. Batch PII redaction reduces processing time from.
Presidio vs anonym.legal: Build vs Buy
Microsoft Presidio is technically free but costs 40-80 engineering hours to deploy properly. anonym.legal delivers the same ML accuracy as a managed SaaS.
Air-Gapped Privacy: Anonymize Offline
FedRAMP and ITAR environments have one thing in common — the cloud is not an option. Reversible pseudonymization under GDPR Art.
The False Positive Tax on PII Tools
Presidio GitHub issue #1071 documents systematic false positives. A 2024 study found 22.7% precision in mixed-language enterprise datasets.
Arabic & Hebrew PII: Western Tools Fail
GDPR doesn't end at the Bosphorus. Arabic and Hebrew PII in EU business workflows is systematically unprotected. XLM-RoBERTa cross-lingual detection and.
Mixed-Language PII: Monolingual Tools Fail
72% of EU enterprises process documents in 3+ languages simultaneously. Mixed-language documents cause 45% higher PII miss rates in monolingual NER tools.
APAC PII: Thai, Indonesian, Vietnamese
A Singapore fintech processing 500,000 monthly support chats across 12 APAC languages found their English-only tool missed PII in 60% of non-English.
False Positives: Why ML Redaction Fails
A 2024 benchmark found Presidio generated 13,536 false positive name detections across 4,434 samples — flagging pronouns, vessel names, and countries as.
ISO 27001 + ZK Cuts Vendor Assessment Time
A 2025 survey found 'lack of recognized security certification' was the #2 reason CISOs disqualify SaaS vendors. Here's what the ISO 27001 +.
ZK Architecture Shortens Sales Cycles
Enterprise vendor security questionnaires average 100+ questions. Zero-knowledge architecture answers the hardest ones definitively — and converts.
LastPass Breach: Vendor Security Lessons
LastPass encrypted their users' data. The vaults were still exfiltrated. 600K+ Okta records followed. SaaS security incidents increased 300% from 2022 to.
Evaluating ZK Claims After LastPass
$438M stolen from LastPass users after their 'encrypted' vaults were breached. A £1.2M ICO fine followed. Here's the checklist for evaluating whether a.
LibreOffice PII Anonymization Extension
Step-by-step guide to anonymizing PII in LibreOffice documents using the anonym.legal extension.
LibreOffice vs Office: PII Redaction
Detailed comparison of PII anonymization capabilities in LibreOffice (anonym.legal extension) vs. Microsoft Office (Office Add-in).
Air-Gapped PII: Offline-First for Defense
41% of enterprise security policies prohibit cloud processing of classified documents.
Reversible vs Permanent Redaction Choice
GDPR distinguishes anonymization from pseudonymization. Courts need originals. Research needs re-identification. Learn when to use each approach.
Multi-Language NER: English Fails Arabic
English NER models achieve 85-92% accuracy. Arabic and Chinese? Often 50-70%. Learn about the technical challenges and how to build truly.
Use Claude & ChatGPT Without Leaking PII
A developer's guide to using AI assistants securely. Set up MCP Server integration for transparent PII protection in Claude Desktop, Cursor, and VS Code.
Start Protecting Your Data Today
285+ entity types, 48 languages, enterprise-grade security at startup pricing.