Skip to main content

Anonymization

When enabled, ingested data is anonymized by removing Personally Identifiable Information (PII) before storage and any downstream processing (training datasets and live inference).

Events for analytics are anonymized as well.

Anonymization Flow

When to Use Anonymization

Enable anonymization for:

  • Customer data with strict privacy requirements
  • GDPR/privacy regulation compliance
  • Highly sensitive conversations

Considerations:

  • May reduce model quality if PII is contextually relevant
  • Cannot be reversed after anonymization
  • Applies to both training data and live conversations

NER Model Options

Deepdesk's NER Service supports multiple underlying models. Choice impacts detection quality, latency, and infrastructure cost.

  • Flair

    • Quality: Higher precision/recall on common PII types.
    • Performance: Requires GPUs to meet production latency/throughput targets; CPU-only runs are typically too slow.
    • Cost: Higher (GPU instances).
    • When to choose: Strict compliance needs, high sensitivity to false negatives/positives, GPU capacity available.
  • spaCy

    • Quality: Generally lower than Flair; sufficient for many use cases.
    • Performance: CPU-friendly, faster to run without GPUs.
    • Cost: Lower (commodity CPU instances).
    • When to choose: Cost-efficiency and simplicity prioritized; approximate detection acceptable.

Operational notes

  • The NER Service is model-pluggable and invokes the configured model per account or environment.
  • Model changes affect newly ingested data only; stored historical data is not retroactively reprocessed.
  • Validate downstream tasks (search, analytics, training) after switching models to assess quality/latency impact.