Managing Conversation History in LangChain Chatbots

In the previous tutorials, we built a chatbot that could remember conversational context using message history, session IDs, and prompt templates. At that stage, the chatbot worked correctly, but there was an important problem left unsolved.

As conversations grow longer, the list of messages sent to the language model grows without bound. Every language model has a limited context window, and once that limit is exceeded, the model either fails or silently drops context. This makes conversation history management a critical requirement for any real chatbot.

In this tutorial, we focus on how to manage and trim conversation history so that only the most relevant messages are passed to the model.

Why Conversation History Must Be Managed

A chatbot works by repeatedly sending the conversation history back to the model. If left unmanaged:

Message lists grow indefinitely
Token usage increases rapidly
Context windows overflow
Performance degrades
Important instructions may be pushed out of scope

Therefore, we need a mechanism that limits the number of tokens while still preserving useful context.

LangChain provides exactly such a mechanism through a helper called trim_messages.

Introducing `trim_messages`

The trim_messages utility allows us to reduce the size of a conversation history before it is sent to the model. Instead of blindly sending everything, we can define rules such as:

Maximum number of tokens to keep
Whether to always keep system messages
Whether partial messages are allowed
Which part of the conversation should be prioritized

Importing Required Components

We begin by importing the relevant classes:

from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from langchain_core.messages import trim_messages

Here:

SystemMessage, HumanMessage, and AIMessage represent structured conversation roles
trim_messages is the helper responsible for trimming history

Creating a Message Trimmer

Let’s configure a trimmer that keeps the most recent context while respecting a token limit.

trimmer = trim_messages(
    max_tokens=45,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human"
)

What Each Parameter Means

max_tokens
The maximum number of tokens allowed after trimming.
strategy=”last”
Keeps the most recent messages, discarding older ones first.
token_counter=model
Uses the same model tokenizer to count tokens accurately.
include_system=True
Ensures system instructions are never dropped.
allow_partial=False
Prevents cutting messages in the middle.
start_on=”human”
Ensures trimming starts cleanly at a human message boundary.

Example: Trimming a Static Conversation

Let’s define a conversation manually:

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hello, my name is Bob."),
    AIMessage(content="Hi Bob."),
    HumanMessage(content="I like vanilla ice cream."),
    AIMessage(content="Nice choice."),
    HumanMessage(content="What is two plus two?"),
    AIMessage(content="Four."),
    HumanMessage(content="Thanks!")
]

Now apply the trimmer:

trimmed_messages = trimmer.invoke(messages)

If the token limit is high enough, most messages remain. If we lower max_tokens, older parts of the conversation are removed first.

This demonstrates an important principle: Memory is selective, not absolute.

Why Some Questions Stop Working After Trimming

Suppose we reduce the token limit aggressively. The earlier message “I like vanilla ice cream” might be trimmed out. If the user later asks: “What ice cream do I like?”

The model can no longer answer correctly because that information is no longer part of the context. This is expected behavior and highlights why trimming strategy matters.

Applying Trimming Inside a Chain

Manually trimming messages every time is inconvenient. Instead, we integrate trimming directly into the chain using LCEL.

Required Imports

from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

Building a Chain with Trimming

chain = (
    RunnablePassthrough.assign(
        messages=itemgetter("messages") | trimmer
    )
    | prompt
    | model
)

What This Does

Extracts the messages input
Applies trimming
Passes trimmed messages into the prompt
Sends the result to the model

This ensures every invocation automatically trims history.

Why Trimming Can Change Model Answers

If trimmed context removes certain facts, the model will no longer be able to reference them. This is not a bug—it is a direct consequence of respecting the context window.

However, more recent and important information remains intact, which is usually what we want in a conversational system.

Wrapping the Trimmed Chain with Message History

Now we combine everything:

Prompt templates
Message trimming
Session-based memory

chatbot = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages"
)

At this point, the chatbot:

Stores conversation per session
Trims messages automatically
Sends only relevant context to the model

Complete Runnable Program

Below is the full, end-to-end program that demonstrates conversation history trimming in a chatbot.

import os
from dotenv import load_dotenv
from operator import itemgetter

from langchain_groq import ChatGroq
from langchain_core.messages import (
    SystemMessage,
    HumanMessage,
    AIMessage,
    trim_messages
)
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory

# --------------------------------------------------
# Load environment variables
# --------------------------------------------------
load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

# --------------------------------------------------
# Initialize model
# --------------------------------------------------
model = ChatGroq(
    api_key=GROQ_API_KEY,
    model="llama-3.1-8b-instant"
)

# --------------------------------------------------
# Prompt template
# --------------------------------------------------
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant."),
        MessagesPlaceholder(variable_name="messages")
    ]
)

# --------------------------------------------------
# Message trimmer
# --------------------------------------------------
trimmer = trim_messages(
    max_tokens=45,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human"
)

# --------------------------------------------------
# Chain with trimming
# --------------------------------------------------
chain = (
    RunnablePassthrough.assign(
        messages=itemgetter("messages") | trimmer
    )
    | prompt
    | model
)

# --------------------------------------------------
# Session history store
# --------------------------------------------------
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# --------------------------------------------------
# Wrap chain with history
# --------------------------------------------------
chatbot = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages"
)

# --------------------------------------------------
# Run conversation
# --------------------------------------------------
config = {"configurable": {"session_id": "chat_5"}}

print("\n--- Conversation Start ---\n")

chatbot.invoke(
    {"messages": [HumanMessage(content="Hello, my name is Bob.")]},
    config=config
)

chatbot.invoke(
    {"messages": [HumanMessage(content="I like vanilla ice cream.")]},
    config=config
)

chatbot.invoke(
    {"messages": [HumanMessage(content="What is two plus two?")]},
    config=config
)

response = chatbot.invoke(
    {"messages": [HumanMessage(content="What is my name?")]},
    config=config
)

print("AI:", response.content)

print("\n--- Conversation End ---\n")

Output

--- Conversation Start ---

AI: I don't have any information about your name. This conversation just started, and I don't have any prior knowledge about you. If you'd like to share your name, I'd be happy to chat with you!

--- Conversation End ---