Assistant Threads vs Threadless
Deepdesk provides two different modes for managing conversation context with assistants: thread-based and threadless. Understanding the differences between these approaches is crucial for designing effective assistant workflows and managing conversation state.
System Message Structure
Both threadless and thread-based processing use the same prompt construction approach detailed in How Deepdesk Constructs Assistant Prompts. This includes the hard-coded system instructions and user-defined instructions, along with the structured blocks for metadata, parameters, memory, and transcript. This consistent structure applies to both conversation management modes, ensuring uniform formatting and behavior.
Thread-Based Conversation Management
Thread-based conversation management creates and maintains a persistent conversation thread for each unique customer interaction. Each time an assistant is called within the same conversation, it has access to the complete history of the interaction.
How Thread-Based Management Works
- A thread is created when a customer interaction begins
- Each assistant call, user message, and tool response is appended to the thread
- The entire context (including metadata, parameters, and previous tool calls) is available to the assistant
- The assistant can reference previous questions, answers, and tool results
- State is naturally maintained between calls to the assistant
Thread Structure
A thread contains the following elements:
- System message: Contains the hard-coded system instructions for interpreting metadata and parameters, along with the user-defined instructions.
- User messages: Customer and agent messages with conversation metadata, memory, and parameters
- Assistant messages: Responses from the assistant, including tool calls
- Tool messages: Results returned from tool calls
Here's a simplified view of how a thread develops:
[System Message] → Initial instructions and format
↓
[User Message 1] → First input with metadata, memory, and parameters
↓
[Assistant Message 1] → Assistant response, in this case tool call
↓
[Tool Message 1] → Tool response
↓
[Assistant Response 1] → First response to user
↓
[User Message 2] → Updated input with new messages, memory, and parameters
↓
[Assistant Message 2] → Follow-up response with context
↓
... continues ...
Sequential Evaluation Process
When an assistant is configured to run for every conversation update, the system ensures orderly processing:
- Wait for completion: The assistant waits for any previous evaluations to complete before starting a new one
- Sequential processing: Evaluations are not run in parallel - each one waits for the previous to finish
- Cumulative updates: When a new evaluation starts, the user message includes all newly added customer and agent messages since the last evaluation
- Complete context: Along with the new messages, the updated metadata, memory, and parameters are also included
This sequential approach ensures that:
- Each evaluation has access to the complete, up-to-date conversation state
- Tool calls and responses don't conflict with each other
- The assistant can build coherently on previous interactions
- Context remains consistent throughout the conversation lifecycle
Benefits of Thread-Based Approach
- Continuous context: The assistant remembers the entire conversation history
- Tool result persistence: Results from previous tool calls remain available
- Efficient state management: No need to manually track conversation state
- Natural conversation flow: Assistants can easily reference previous exchanges
- Sequential processing: Orderly evaluation prevents conflicts and maintains consistency
When to Use Thread-Based Assistants
Use the thread-based approach when:
- Assistants need to reference previous questions or answers
- You want to build on previous tool call results
- Context needs to be maintained across the entire customer journey
- You need fine-grained control over conversation state
Threadless Processing
Threadless processing takes a different approach, where each call to an assistant is treated as an independent interaction with all context provided in a single call.
How Threadless Processing Works
- All system instructions are included in a single message
- All user messages are combined into a second message
- The assistant processes everything at once without persistent state
- Each call is independent and doesn't have access to previous tool calls or responses
Threadless Structure
A threadless call contains just two primary elements:
- System message: Contains all user and system instructions and context
- User message: Contains all conversation content, metadata, memory, and parameters
Benefits of Threadless Approach
- Simplicity: No need to manage thread state
- More cost-effective: Less redundancy results in fewer tokens consumed
When to Use Threadless Assistants
Use the threadless approach when:
- You want to avoid maintaining state between calls
- Assistants perform discrete, independent tasks
Implementation Examples
Thread-Based Example
{"messages":[{"role":"system","content":"System instructions..."},{"role":"user","content":"<conversation>\n<metadata>\n - ID: 123\n</metadata>\n</conversation>\n"},{"role":"assistant","tool_calls":[{"id":"call_123","function":{"name":"call_api","arguments":{...}}}]},{"role":"tool","content":"API response data...","tool_call_id":"call_123"},{"role":"assistant","content":"The data has been stored in the database..."},{"role":"user","content":"<conversation>\n<transcript>\n- visitor: New message\n</transcript>\n</conversation>\n"},// Additional messages as the conversation continues...]}
In this example, each interaction builds on previous ones, with the assistant maintaining context across multiple turns.
Threadless Example
{"messages":[{"role":"system","content":"System instructions..."},{"role":"user","content":"<conversation>\n<transcript>\n- visitor: Message 1\n- agent: Response 1\n- visitor: Message 2\n</transcript>\n<metadata>\n - ID: 123\n</metadata>\n</conversation>\n"}]}
In this example, all conversation content is provided in a single message, requiring the assistant to process everything at once.
State Management Considerations
Thread-Based State Management
- Memory growth: Threads accumulate content over time, which may hit token limits
- Conversation evolution: The thread naturally captures how the conversation develops
- Tool chaining: Results from one tool call can inform subsequent tool calls
Threadless State Management
- Manual context management: You must manually include relevant context
- Restarts: Each call effectively "restarts" the assistant from scratch
💡 Using Memory for State Management
While threadless processing doesn't maintain state between calls automatically, you can use the write_to_memory tool to manually track important information throughout a conversation. The memory is persistent across the entire conversation and appears in the <memory> section of subsequent user messages.
Example use cases:
- Store customer preferences:
write_to_memory(key="preferred_contact_method",data="email") - Track conversation progress:
write_to_memory(key="troubleshooting_step",data="3") - Remember extracted information:
write_to_memory(key="account_number",data="12345")
How it works:
- Assistant uses
write_to_memoryto store key information - The stored data appears in
<memory>sections of future user messages - Assistant can reference this data using
{memory.key_name}syntax - This provides a way to maintain state even in threadless processing
For detailed information about the write_to_memory tool, see the Tool Configuration documentation.
How the System Maintains Conversation Context
In Thread-Based Processing
As shown in the provided example, a thread maintains:
- The complete message history, including system, user, assistant, and tool messages
- Tool calls and their results for reference in subsequent turns
- Updates to conversation metadata and parameters over time
- The assistant's previous responses and reasoning
The system automatically appends new messages to the thread, maintaining the full context as the conversation evolves.
In Threadless Processing
For threadless processing, the system:
- Combines all relevant conversation context into a single message
- Provides complete instructions in the system message
- Does not maintain state between calls
- Requires explicit inclusion of any needed context
Conclusion
Choosing between thread-based and threadless processing depends on your specific use case:
- Thread-based provides richer context and natural conversation flow but requires managing growing context size
- Threadless offers simplicity and consistent memory usage but requires manual context management
For most customer service scenarios involving multi-turn conversations, the thread-based approach provides a more natural experience. For simple, independent queries or high-volume processing, the threadless approach may be more efficient.
By understanding these different approaches, you can design assistant workflows that balance context richness with performance and scalability requirements.