User Guide
This guide explains how to configure anonymization options for an account in Admin. These settings control how Deepdesk removes or masks Personally Identifiable Information (PII) from messages during ingestion and how agent identity is displayed.
For the system architecture and processing flow, see Architecture › Anonymization.
Where to configure​
In Admin, open the target account and navigate to the Anonymization section. Settings apply at the account level and affect both webhook and frontend ingestion paths.
After modifying anonymization options, click Save and then deploy the account to apply changes.
Note that changes affect newly ingested data only; stored historical data is not retroactively reprocessed.
Settings​
The Anonymization panel contains the following options:
1) Anonymizer model​
- Purpose: Select the underlying library or model used by the NER service to detect PII entities.
- Options:
- Flair — higher quality entity recognition, but requires GPUs to perform at production speed. Best for high‑accuracy needs; higher infra cost.
- spaCy — CPU‑friendly, faster and cheaper to run; typically lower recall/precision than Flair.
- Guidance:
- Choose Flair when compliance or data sensitivity demands stronger detection quality, and GPU capacity is available.
- Choose spaCy for cost‑efficient, low‑latency setups where approximate detection is acceptable.
- If GPUs are unavailable, avoid Flair in production due to latency and throughput constraints.
- Notes:
- Different models vary in accuracy, supported entity types, and performance.
- Changing the model affects future ingested messages; previously stored data is not retroactively reprocessed.
2) Anonymize messages​
- Purpose: Toggles PII removal for incoming messages.
- When enabled, message text is sent to the NER service, detected PII entities are masked/removed, and only the anonymized text is stored/published.
- Scope: Applies to storage in Cloud SQL and publications to BigQuery used for analytics and training.
3) Anonymize messages — ignore URLs​
- Purpose: Controls whether URLs are excluded from PII removal.
- When set to Yes, URLs in the text are ignored by the anonymizer (left unchanged). This is useful if links are required for downstream workflows or auditing.
- When set to No, URLs may be masked if the model flags them as containing PII.
4) Anonymize messages — ignore list​
-
Purpose: Provide phrases that must not be altered by anonymization.
-
Input format: Comma‑separated values (CSV) on a single line. Example:
John Doe,+3161234567 -
Behavior: Exact substring matches in the incoming message are exempt from masking. Use with care to avoid leaking sensitive data.
5) Agent pseudonyms​
- Purpose: Replace agent identifiers in the UI and logs with pseudonyms.
- When enabled, the agent ID, name, and email are replaced with pseudonymous values. This helps during privacy‑sensitive reviews, demos, and external analytics exports.
- Notes:
- Pseudonymization affects presentation; it does not change access control or underlying account membership.
- If you export raw datasets from analytics, confirm whether pseudonyms or real identifiers are included based on your export pipeline.
Recommendations​
- Start in a staging environment. Enable anonymization and validate that required downstream features (search, analytics, training) behave as expected with masked content.
- Use the ignore list sparingly and only for terms that must remain unaltered for business reasons.
- If URLs carry tracking or session parameters, prefer enabling “ignore URLs” to preserve link integrity.
- After changing the model, spot‑check a few conversations to verify entity coverage and false‑positive rates.
Testing the configuration​
- Enable the desired options and deploy.
- Send a test conversation that includes sample PII (e.g., a name, phone number, email, and a URL).
- In Admin, open the conversation and verify the stored message text shows masked/removed PII according to your settings.
- Check analytics dashboards or BigQuery preview (if you have access) to confirm that exports are anonymized.