Anonymization

When enabled, ingested data is anonymized by removing Personally Identifiable Information (PII) before storage and any downstream processing (training datasets and live inference).

Events for analytics are anonymized as well.

Anonymization Flow

When to Use Anonymization

Enable anonymization for:

Customer data with strict privacy requirements
GDPR/privacy regulation compliance
Highly sensitive conversations

Considerations:

May reduce model quality if PII is contextually relevant
Cannot be reversed after anonymization
Applies to both training data and live conversations

NER Model Options

Deepdesk's NER Service supports multiple underlying models. Choice impacts detection quality, latency, and infrastructure cost.

Flair
- Quality: Higher precision/recall on common PII types.
- Performance: Requires GPUs to meet production latency/throughput targets; CPU-only runs are typically too slow.
- Cost: Higher (GPU instances).
- When to choose: Strict compliance needs, high sensitivity to false negatives/positives, GPU capacity available.
spaCy
- Quality: Generally lower than Flair; sufficient for many use cases.
- Performance: CPU-friendly, faster to run without GPUs.
- Cost: Lower (commodity CPU instances).
- When to choose: Cost-efficiency and simplicity prioritized; approximate detection acceptable.

Operational notes

The NER Service is model-pluggable and invokes the configured model per account or environment.
Model changes affect newly ingested data only; stored historical data is not retroactively reprocessed.
Validate downstream tasks (search, analytics, training) after switching models to assess quality/latency impact.

See the ingestion pipeline overview: Data Ingestion
Administration pages:
- Administration › Anonymization
- Administration › Anonymization User Guide

Anonymization Flow​

When to Use Anonymization​

NER Model Options​

Related​

Anonymization Flow

When to Use Anonymization

NER Model Options

Related