Assistant Threads vs Threadless

Deepdesk provides two different modes for managing conversation context with assistants: thread-based and threadless. Understanding the differences between these approaches is crucial for designing effective assistant workflows and managing conversation state.

System Message Structure

Both threadless and thread-based processing use the same prompt construction approach detailed in How Deepdesk Constructs Assistant Prompts. This includes the hard-coded system instructions and user-defined instructions, along with the structured blocks for metadata, parameters, memory, and transcript. This consistent structure applies to both conversation management modes, ensuring uniform formatting and behavior.

Thread-Based Conversation Management

Thread-based conversation management creates and maintains a persistent conversation thread for each unique customer interaction. Each time an assistant is called within the same conversation, it has access to the complete history of the interaction.

How Thread-Based Management Works

A thread is created when a customer interaction begins
Each assistant call, user message, and tool response is appended to the thread
The entire context (including metadata, parameters, and previous tool calls) is available to the assistant
The assistant can reference previous questions, answers, and tool results
State is naturally maintained between calls to the assistant

Thread Structure

A thread contains the following elements:

System message: Contains the hard-coded system instructions for interpreting metadata and parameters, along with the user-defined instructions.
User messages: Customer and agent messages with conversation metadata, memory, and parameters
Assistant messages: Responses from the assistant, including tool calls
Tool messages: Results returned from tool calls

Here's a simplified view of how a thread develops:

[System Message] → Initial instructions and format
↓
[User Message 1] → First input with metadata, memory, and parameters
↓
[Assistant Message 1] → Assistant response, in this case tool call
↓
[Tool Message 1] → Tool response
↓
[Assistant Response 1] → First response to user
↓
[User Message 2] → Updated input with new messages, memory, and parameters
↓
[Assistant Message 2] → Follow-up response with context
↓
... continues ...

Sequential Evaluation Process

When an assistant is configured to run for every conversation update, the system ensures orderly processing:

Wait for completion: The assistant waits for any previous evaluations to complete before starting a new one
Sequential processing: Evaluations are not run in parallel - each one waits for the previous to finish
Cumulative updates: When a new evaluation starts, the user message includes all newly added customer and agent messages since the last evaluation
Complete context: Along with the new messages, the updated metadata, memory, and parameters are also included

This sequential approach ensures that:

Each evaluation has access to the complete, up-to-date conversation state
Tool calls and responses don't conflict with each other
The assistant can build coherently on previous interactions
Context remains consistent throughout the conversation lifecycle

Benefits of Thread-Based Approach

Continuous context: The assistant remembers the entire conversation history
Tool result persistence: Results from previous tool calls remain available
Efficient state management: No need to manually track conversation state
Natural conversation flow: Assistants can easily reference previous exchanges
Sequential processing: Orderly evaluation prevents conflicts and maintains consistency

When to Use Thread-Based Assistants

Use the thread-based approach when:

Assistants need to reference previous questions or answers
You want to build on previous tool call results
Context needs to be maintained across the entire customer journey
You need fine-grained control over conversation state

Threadless Processing

Threadless processing takes a different approach, where each call to an assistant is treated as an independent interaction with all context provided in a single call.

How Threadless Processing Works

All system instructions are included in a single message
All user messages are combined into a second message
The assistant processes everything at once without persistent state
Each call is independent and doesn't have access to previous tool calls or responses

Threadless Structure

A threadless call contains just two primary elements:

System message: Contains all user and system instructions and context
User message: Contains all conversation content, metadata, memory, and parameters

Benefits of Threadless Approach

Simplicity: No need to manage thread state
More cost-effective: Less redundancy results in fewer tokens consumed

When to Use Threadless Assistants

Use the threadless approach when:

You want to avoid maintaining state between calls
Assistants perform discrete, independent tasks

Implementation Examples

Thread-Based Example

{"messages":[{"role":"system","content":"System instructions..."},{"role":"user","content":"<conversation>\n<metadata>\n - ID: 123\n</metadata>\n</conversation>\n"},{"role":"assistant","tool_calls":[{"id":"call_123","function":{"name":"call_api","arguments":{...}}}]},{"role":"tool","content":"API response data...","tool_call_id":"call_123"},{"role":"assistant","content":"The data has been stored in the database..."},{"role":"user","content":"<conversation>\n<transcript>\n- visitor: New message\n</transcript>\n</conversation>\n"},// Additional messages as the conversation continues...]}

In this example, each interaction builds on previous ones, with the assistant maintaining context across multiple turns.

Threadless Example

{"messages":[{"role":"system","content":"System instructions..."},{"role":"user","content":"<conversation>\n<transcript>\n- visitor: Message 1\n- agent: Response 1\n- visitor: Message 2\n</transcript>\n<metadata>\n - ID: 123\n</metadata>\n</conversation>\n"}]}

In this example, all conversation content is provided in a single message, requiring the assistant to process everything at once.

State Management Considerations

Thread-Based State Management

Memory growth: Threads accumulate content over time, which may hit token limits
Conversation evolution: The thread naturally captures how the conversation develops
Tool chaining: Results from one tool call can inform subsequent tool calls

Threadless State Management

Manual context management: You must manually include relevant context
Restarts: Each call effectively "restarts" the assistant from scratch

💡 Using Memory for State Management

While threadless processing doesn't maintain state between calls automatically, you can use the write_to_memory tool to manually track important information throughout a conversation. The memory is persistent across the entire conversation and appears in the <memory> section of subsequent user messages.

Example use cases:

Store customer preferences: write_to_memory(key="preferred_contact_method",data="email")
Track conversation progress: write_to_memory(key="troubleshooting_step",data="3")
Remember extracted information: write_to_memory(key="account_number",data="12345")

How it works:

Assistant uses write_to_memory to store key information
The stored data appears in <memory> sections of future user messages
Assistant can reference this data using {memory.key_name} syntax
This provides a way to maintain state even in threadless processing

For detailed information about the write_to_memory tool, see the Tool Configuration documentation.

How the System Maintains Conversation Context

In Thread-Based Processing

As shown in the provided example, a thread maintains:

The complete message history, including system, user, assistant, and tool messages
Tool calls and their results for reference in subsequent turns
Updates to conversation metadata and parameters over time
The assistant's previous responses and reasoning

The system automatically appends new messages to the thread, maintaining the full context as the conversation evolves.

In Threadless Processing

For threadless processing, the system:

Combines all relevant conversation context into a single message
Provides complete instructions in the system message
Does not maintain state between calls
Requires explicit inclusion of any needed context

Conclusion

Choosing between thread-based and threadless processing depends on your specific use case:

Thread-based provides richer context and natural conversation flow but requires managing growing context size
Threadless offers simplicity and consistent memory usage but requires manual context management

For most customer service scenarios involving multi-turn conversations, the thread-based approach provides a more natural experience. For simple, independent queries or high-volume processing, the threadless approach may be more efficient.

By understanding these different approaches, you can design assistant workflows that balance context richness with performance and scalability requirements.

System Message Structure​

Thread-Based Conversation Management​

How Thread-Based Management Works​

Thread Structure​

Sequential Evaluation Process​

Benefits of Thread-Based Approach​

When to Use Thread-Based Assistants​

Threadless Processing​

How Threadless Processing Works​

Threadless Structure​

Benefits of Threadless Approach​

When to Use Threadless Assistants​

Implementation Examples​

Thread-Based Example​

Threadless Example​

State Management Considerations​

Thread-Based State Management​

Threadless State Management​

💡 Using Memory for State Management​

How the System Maintains Conversation Context​

In Thread-Based Processing​

In Threadless Processing​

Conclusion​