Skip to main content

Data Ingestion

Overview

Data ingestion is the process of capturing and processing incoming conversation messages from CX platforms as they are sent and received by customer service agents. This data serves two critical purposes:

  1. Training: Offline training jobs that build and improve ML models
  2. Inference: Real-time suggestion generation based on live conversations
Legacy Note

A legacy Backend V1 ingestion system is maintained for a small number of historical accounts that haven't migrated. This documentation focuses on the current architecture.


Ingestion Methods

There are two primary methods for receiving conversation data from CX platforms:

Webhook-Based Ingestion

The CX platform sends copies of all messages to a dedicated Deepdesk endpoint.

Characteristics:

  • Messages arrive in the native format of the sending platform
  • Platform-controlled delivery
  • Requires platform support for webhooks
  • More reliable and comprehensive event coverage

Used by:

  • RingCentral/Dimelo
  • Dialogue Cloud
  • Tracebuzz
  • Coosto

Frontend-Based Ingestion (API)

The Deepdesk SDK running in the browser sends messages directly to the Deepdesk API.

Characteristics:

  • Messages sent in Deepdesk's generic format
  • Client-side data capture
  • Works for platforms without webhook support
  • Faster setup, no backend platform configuration needed

Used by:

  • Genesys Cloud
  • Genesys WDE
  • Salesforce Chat/Email
  • LivePerson/LiveEngage

Architecture

The Deepdesk ingestion system handles all webhook and frontend ingestion, along with the Conversation CRUD API.

Key Components

Ingestion Flow

  1. Message Receipt:

    • Webhook: CX platform → Admin Service → Backend
    • Frontend: Deepdesk SDK → Backend API
  2. Schema Conversion (Admin only):

    • Admin converts platform-native format to Deepdesk generic schema
    • Frontend ingestion already uses generic schema
  3. Anonymization (if enabled):

    • Backend sends message to NER Service
    • PII data removed from message content
    • Anonymized message returned to Backend
  4. Storage:

    • Message stored in Cloud SQL for Backend operations
    • Engine reads conversations directly from Cloud SQL
    • Message published to BigQuery for training
    • Message published to BigQuery for analytics
  5. Recommendations:

    • Backend requests recommendations from Engine
    • Engine retrieves conversation directly from Cloud SQL
    • Engine generates suggestions based on conversation data

Architecture Benefits

  • Performance: Direct database access with optimized queries
  • Scalability: Decoupled storage and inference layers
  • Flexibility: Modern RESTful API design
  • Reliability: Separation of concerns between ingestion and recommendation services
  • Simplicity: Single source of truth in Cloud SQL

Webhook Endpoints

Each account has a dedicated webhook endpoint configured in Admin:

https://<account>.deepdesk.com/platform/webhook/<account-uuid>

Where:

  • <account> is the account code (e.g., vodafoneziggo)
  • <account-uuid> is found in the account details in Admin

Webhook Security

Different platforms use different security mechanisms to authenticate webhook requests:

PlatformSecurity MethodDescriptionPlatformConfig Field
RingCentral/DimeloSecret HeaderSecret included as X-Dimelo-Secret header
Verification token used for webhook creation/updates
webhook_secret
webhook_verification_token
TracebuzzSecret HeaderSecret included as X-Deepdesk-Webhook-Secret headerwebhook_secret
CoostoHMACSignature in x-signature header
Hash of payload signed with client secret
webhook_verification_token
Dialogue CloudHMAC/JWTPlatform-specific authenticationclient_secret
info

Security Note: Secrets must be present on both sides to authenticate incoming messages. For HMAC-based authentication, the signature is verified by computing a hash of the payload using the shared secret.


Profile Assignment

During ingestion, Backend determines the profile code for each conversation. The profile is critical because:

  • Recommendations are always profile-specific
  • No profile = no recommendations

Assignment Logic

Webhook Ingestion:

  • Admin assigns profile based on platform metadata
  • Profile stored in conversation tags field (JSON)

Frontend Ingestion:

  • Backend determines profile based on included tags
  • If no tags present (e.g., Playground), Backend assigns the "current" profile
  • Current profile = profile already assigned to the conversation

Tag-Based Assignment

Conversations are assigned profiles using platform metadata stored in a JSON tags field at both conversation and message levels. This allows flexible, platform-specific routing based on:

  • Queue names
  • Team assignments
  • Custom platform metadata
  • Conversation attributes
JsonLogic

See JsonLogic Assignment & Filtering for details on how JsonLogic is used for profile assignment.


Anonymization

Anonymization is covered in detail on a dedicated page. See:

Summary: When enabled, ingested data is processed to remove Personally Identifiable Information (PII) prior to storage and downstream usage (training and live inference). The Backend sends messages to the NER service for PII detection and masking before persisting or publishing records.


Data Processing Pipeline

End-to-End Flow


Implementation References

Backend

Engine


Legacy System Notes

All new accounts use the current Backend ingestion system by default. A small number of historical accounts still use the legacy Backend V1 system and will be migrated on a case-by-case basis. During migration, Admin forwards webhook messages to both systems to ensure operability.


Next Steps

For detailed operational guidance:

For feature-specific ingestion: