Data Ingestion

Overview

Data ingestion is the process of capturing and processing incoming conversation messages from CX platforms as they are sent and received by customer service agents. This data serves two critical purposes:

Training: Offline training jobs that build and improve ML models
Inference: Real-time suggestion generation based on live conversations

Legacy Note

A legacy Backend V1 ingestion system is maintained for a small number of historical accounts that haven't migrated. This documentation focuses on the current architecture.

Ingestion Methods

There are two primary methods for receiving conversation data from CX platforms:

Webhook-Based Ingestion

The CX platform sends copies of all messages to a dedicated Deepdesk endpoint.

Characteristics:

Messages arrive in the native format of the sending platform
Platform-controlled delivery
Requires platform support for webhooks
More reliable and comprehensive event coverage

Used by:

RingCentral/Dimelo
Dialogue Cloud
Tracebuzz
Coosto

Frontend-Based Ingestion (API)

The Deepdesk SDK running in the browser sends messages directly to the Deepdesk API.

Characteristics:

Messages sent in Deepdesk's generic format
Client-side data capture
Works for platforms without webhook support
Faster setup, no backend platform configuration needed

Used by:

Genesys Cloud
Genesys WDE
Salesforce Chat/Email
LivePerson/LiveEngage

Architecture

The Deepdesk ingestion system handles all webhook and frontend ingestion, along with the Conversation CRUD API.

Key Components

Ingestion Flow

Message Receipt:
- Webhook: CX platform → Admin Service → Backend
- Frontend: Deepdesk SDK → Backend API
Schema Conversion (Admin only):
- Admin converts platform-native format to Deepdesk generic schema
- Frontend ingestion already uses generic schema
Anonymization (if enabled):
- Backend sends message to NER Service
- PII data removed from message content
- Anonymized message returned to Backend
Storage:
- Message stored in Cloud SQL for Backend operations
- Engine reads conversations directly from Cloud SQL
- Message published to BigQuery for training
- Message published to BigQuery for analytics
Recommendations:
- Backend requests recommendations from Engine
- Engine retrieves conversation directly from Cloud SQL
- Engine generates suggestions based on conversation data

Architecture Benefits

Performance: Direct database access with optimized queries
Scalability: Decoupled storage and inference layers
Flexibility: Modern RESTful API design
Reliability: Separation of concerns between ingestion and recommendation services
Simplicity: Single source of truth in Cloud SQL

Webhook Endpoints

Each account has a dedicated webhook endpoint configured in Admin:

https://<account>.deepdesk.com/platform/webhook/<account-uuid>

Where:

<account> is the account code (e.g., vodafoneziggo)
<account-uuid> is found in the account details in Admin

Webhook Security

Different platforms use different security mechanisms to authenticate webhook requests:

Platform	Security Method	Description	PlatformConfig Field
RingCentral/Dimelo	Secret Header	Secret included as `X-Dimelo-Secret` header Verification token used for webhook creation/updates	`webhook_secret` `webhook_verification_token`
Tracebuzz	Secret Header	Secret included as `X-Deepdesk-Webhook-Secret` header	`webhook_secret`
Coosto	HMAC	Signature in `x-signature` header Hash of payload signed with client secret	`webhook_verification_token`
Dialogue Cloud	HMAC/JWT	Platform-specific authentication	`client_secret`

info

Security Note: Secrets must be present on both sides to authenticate incoming messages. For HMAC-based authentication, the signature is verified by computing a hash of the payload using the shared secret.

Profile Assignment

During ingestion, Backend determines the profile code for each conversation. The profile is critical because:

Recommendations are always profile-specific
No profile = no recommendations

Assignment Logic

Webhook Ingestion:

Admin assigns profile based on platform metadata
Profile stored in conversation tags field (JSON)

Frontend Ingestion:

Backend determines profile based on included tags
If no tags present (e.g., Playground), Backend assigns the "current" profile
Current profile = profile already assigned to the conversation

Tag-Based Assignment

Conversations are assigned profiles using platform metadata stored in a JSON tags field at both conversation and message levels. This allows flexible, platform-specific routing based on:

Queue names
Team assignments
Custom platform metadata
Conversation attributes

JsonLogic

See JsonLogic Assignment & Filtering for details on how JsonLogic is used for profile assignment.

Anonymization

Anonymization is covered in detail on a dedicated page. See:

NER Service

Summary: When enabled, ingested data is processed to remove Personally Identifiable Information (PII) prior to storage and downstream usage (training and live inference). The Backend sends messages to the NER service for PII detection and masking before persisting or publishing records.

Data Processing Pipeline

End-to-End Flow

Implementation References

Backend

Platform Ingestion Handler (Admin)
Ingestion (Backend V2)
Ingestion (Legacy Backend V1)

Engine

Engine Service

Legacy System Notes

All new accounts use the current Backend ingestion system by default. A small number of historical accounts still use the legacy Backend V1 system and will be migrated on a case-by-case basis. During migration, Admin forwards webhook messages to both systems to ensure operability.

Next Steps

For detailed operational guidance:

Platform Integrations: Setting up platform connections
Onboarding: Account provisioning and configuration

For feature-specific ingestion:

Agent Assist: Real-time suggestion features
CX Assistants: Automation and AI agents

Overview​

Ingestion Methods​

Webhook-Based Ingestion​

Frontend-Based Ingestion (API)​

Architecture​

Key Components​

Ingestion Flow​

Architecture Benefits​

Webhook Endpoints​

Webhook Security​

Profile Assignment​

Assignment Logic​

Tag-Based Assignment​

Anonymization​

Data Processing Pipeline​

End-to-End Flow​

Implementation References​

Backend​

Engine​

Legacy System Notes​

Next Steps​