Anonymization
When enabled, ingested data is anonymized by removing Personally Identifiable Information (PII) before storage and any downstream processing (training datasets and live inference).
Events for analytics are anonymized as well.
Anonymization Flow
When to Use Anonymization
Enable anonymization for:
- Customer data with strict privacy requirements
- GDPR/privacy regulation compliance
- Highly sensitive conversations
Considerations:
- May reduce model quality if PII is contextually relevant
- Cannot be reversed after anonymization
- Applies to both training data and live conversations
NER Model Options
Deepdesk's NER Service supports multiple underlying models. Choice impacts detection quality, latency, and infrastructure cost.
-
Flair
- Quality: Higher precision/recall on common PII types.
- Performance: Requires GPUs to meet production latency/throughput targets; CPU-only runs are typically too slow.
- Cost: Higher (GPU instances).
- When to choose: Strict compliance needs, high sensitivity to false negatives/positives, GPU capacity available.
-
spaCy
- Quality: Generally lower than Flair; sufficient for many use cases.
- Performance: CPU-friendly, faster to run without GPUs.
- Cost: Lower (commodity CPU instances).
- When to choose: Cost-efficiency and simplicity prioritized; approximate detection acceptable.
Operational notes
- The NER Service is model-pluggable and invokes the configured model per account or environment.
- Model changes affect newly ingested data only; stored historical data is not retroactively reprocessed.
- Validate downstream tasks (search, analytics, training) after switching models to assess quality/latency impact.
Related
- See the ingestion pipeline overview: Data Ingestion
- Administration pages: