NER Service
This document describes the architecture and role of the NER (Named Entity Recognition) Service in the Deepdesk platform.
Overview
The NER Service is responsible for detecting and extracting named entities from text. Named Entity Recognition (NER) is a natural language processing technique that identifies and classifies entities in text into predefined categories such as person names, organizations, locations, phone numbers, email addresses, and other identifiable information.
In Deepdesk, the NER Service powers two key features:
- Anonymization: Detecting and masking personally identifiable information (PII) before data is stored or processed
- Conversation Clipper: Extracting entities from conversations for easy copy-paste into CRM systems
Architecture Diagram
Named Entity Recognition Explained
Named Entity Recognition (NER) is an NLP task that scans text and identifies spans that refer to specific types of entities. The NER Service uses multiple detection methods:
- Flair NER model: ML-based detection for names, locations, and organizations
- Regular expressions: Pattern matching for structured data (phone numbers, emails, IBANs, etc.)
- URL matching: Automatic detection of URLs
- Simple string matching: Fixed entities from a configured phrase list
Example
Given the text:
"Hi, my name is John Smith and you can reach me at john@example.com or 555-123-4567."
The NER Service would extract:
| Text | Label | Detection Method |
|---|---|---|
| John Smith | PER | Flair NER |
| john@example.com | ema | Regex |
| 555-123-4567 | pho | Regex |
Entity Labels
| Label | Description | Method |
|---|---|---|
PER | Person name | Flair NER |
LOC | Location | Flair NER |
ORG | Organisation | Flair NER |
MISC | Miscellaneous | Flair NER |
pho | Phone number | Regex |
ema | Email address | Regex |
pc | Postal code | Regex |
str | Street address | Regex |
iban | IBAN number | Regex |
amt | Amount | Regex |
date_ | Date | Regex |
xdig | Three or more digits | Regex |
url | URL | URL matching |
phrase | Configured phrase | String matching |
Use Cases
Anonymization
When anonymization is enabled, the NER Service detects PII entities and replaces them with placeholder tokens before data is stored. This ensures sensitive information never persists in Deepdesk's systems.
Conversation Clipper
The Conversation Clipper extracts meaningful entities from customer conversations, allowing agents to quickly copy information like names, email addresses, order numbers, and phone numbers into their CRM or ticketing system without manual transcription.
Model Options
The NER Service supports two underlying models, configured per account:
spaCy (Local Processing)
- Runs within the NER Service container
- CPU-friendly, no GPU required
- Lower latency for simple workloads
- Suitable for general use cases with acceptable precision
Flair via NER Model Service (Remote Processing)
- Centrally hosted in the regional GCP project
- Requires GPU for production-level performance
- Higher precision and recall for PII detection
- Recommended for strict compliance requirements
The choice between spaCy and Flair is typically made during account setup based on compliance requirements, cost considerations, and accuracy needs. See the Anonymization User Guide for configuration details.
Business Logic Layer
The NER Service applies configurable business logic on top of raw entity detection:
Anonymization Rules
- Ignore List: Specific terms that should never be anonymized (e.g., company names, product names)
- Entity Type Selection: Configure which entity types to anonymize (e.g., anonymize PERSON and PHONE but not ORGANIZATION)
- Replacement Patterns: How entities are masked (e.g.,
[PERSON],[PHONE], or***)
Conversation Clipper Rules
- Ignore List: Terms to exclude from entity extraction (e.g., common greetings that match person name patterns)
- Entity Type Selection: Which entity types to show agents (e.g., show EMAIL and PHONE but not generic PERSON)
- Display Formatting: How entities are presented in the UI
Custom Entities via Regex
Beyond the ML-based entity detection, administrators can define custom entity types using regular expressions. This is useful for:
- Account Numbers: Company-specific account ID formats
- Order Numbers: Custom order reference patterns
- Product Codes: Internal SKU or product identifiers
- Custom Identifiers: Any pattern-based entity specific to the business
Custom regex patterns are evaluated alongside ML model predictions, with configurable priority rules.
Deployment Architecture
The NER Service follows Deepdesk's standard multi-tenant architecture:
- NER Service: Deployed per account namespace, handles business logic and routing
- NER Model Service: Centrally hosted in each regional project, shared across accounts for cost efficiency
This separation allows:
- Account-specific configuration without duplicating expensive GPU resources
- Centralized model updates and maintenance
- Flexible scaling based on actual inference demand
Related
- Administration › Anonymization - Configuration guide
- Conversation Clipper - End-user documentation
- Data Ingestion - Ingestion pipeline overview