Building a Custom JSON Splitter for Large and Nested API Responses

When working with real-world APIs, we often receive large JSON responses that contain deeply nested objects, arrays, and long fields. Before we send this data to an LLM or convert it into embeddings for retrieval, we must break it into smaller and meaningful chunks.

However, depending on your LangChain version, the built-in RecursiveJsonSplitter may not always work reliably, especially if:

Your JSON starts with a list
The JSON contains deeply nested lists
You are using an older version of langchain-text-splitters
You encounter unexpected IndexError or missing arguments like convert_lists

To overcome these issues, we will build a custom JSON splitter that works consistently across all Python and LangChain environments.

This tutorial explains how this splitter is designed and how you can use it to chunk JSON data effectively.

Why Do We Need a Custom JSON Splitter?

Many built-in JSON splitting tools assume the JSON root is always a dictionary and that lists can be safely traversed. But real API responses are often:

lists of dictionaries,
dictionaries containing lists,
or complex nested structures.

For example:

[
  { "id": 1, "title": "Sample", "body": "..." },
  { "id": 2, "title": "Another", "body": "..." }
]

Some versions of LangChain fail to handle this kind of structure due to:

incorrect path tracking
missing internal functions
no support for list-rooted JSON
no support for certain constructor arguments

This leads to errors such as:

IndexError: list index out of range

TypeError: unexpected keyword argument 'convert_lists'

To avoid all of these issues, we will create a robust, custom splitter.

Objective of the Custom JSON Splitter

The custom splitter we create will:

Traverse the JSON recursively
Break it into chunks based on maximum character size
Handle both dicts and lists correctly
Maintain structural meaning
Work with any version of LangChain
Never throw IndexError when traversing lists
Produce clean JSON fragments suitable for embeddings or LLMs

This gives complete control over how JSON is broken apart.

Designing the Custom JSON Splitter

We will build a recursive function that:

Converts each JSON object into a pretty-printed JSON string
Checks if its size exceeds the allowed limit
If it fits — store it as a chunk
If it doesn’t fit — recursively process its children

It handles:

Dictionaries
Lists
Primitives
Large strings

This guarantees safe splitting for all kinds of JSON structures.

Complete Working Code

Here is the full implementation of the custom JSON splitter along with an example API:

import json
import requests
from typing import Any, List


# ------------------------------------------------------------
# Custom Recursive JSON Splitter
# ------------------------------------------------------------

def json_to_chunks(data: Any, max_chars: int = 300) -> List[str]:
    """Recursively split a JSON object into chunks based on max character size."""
    chunks = []

    def recurse(obj, path=""):
        text = json.dumps(obj, indent=2)

        # If object fits into one chunk, use it directly
        if len(text) <= max_chars:
            chunks.append(text)
            return

        # If dictionary, split key-by-key
        if isinstance(obj, dict):
            for key, value in obj.items():
                recurse(value, f"{path}/{key}")
            return

        # If list, split item-by-item
        if isinstance(obj, list):
            for index, item in enumerate(obj):
                recurse(item, f"{path}[{index}]")
            return

        # For large primitive values or long strings
        chunks.append(text)

    recurse(data)
    return chunks


# ------------------------------------------------------------
# 1. Load JSON from an API
# ------------------------------------------------------------

url = "https://jsonplaceholder.typicode.com/posts"
data = requests.get(url).json()

# Wrap JSON list in a dict for consistent processing
wrapped_json = {"items": data}

print("Fetched JSON successfully.")
print("------------------------------------------------------------")


# ------------------------------------------------------------
# 2. Split JSON into chunks
# ------------------------------------------------------------

chunks = json_to_chunks(wrapped_json, max_chars=300)

print("Total chunks generated:", len(chunks))
print("------------------------------------------------------------")

print("\nFIRST 3 CHUNKS:\n")
for c in chunks[:3]:
    print(c)
    print("------------------------------------------------------------")


# ------------------------------------------------------------
# 3. OPTIONAL: Convert chunks to LangChain Document objects
# ------------------------------------------------------------

try:
    from langchain.schema import Document
    docs = [Document(page_content=c, metadata={}) for c in chunks]
    print("\nCreated Document objects:", len(docs))
except:
    print("\nLangChain not installed or incompatible version.")

How the Custom Splitter Works

Here is the logic:

1. Convert JSON to Pretty String

We use:

text = json.dumps(obj, indent=2)

This gives a human-readable string representation.

2. Check if the chunk fits

If the entire JSON fragment ≤ max_chars, it is stored as one chunk.

3. Handle Dictionaries

If obj is a dict:

Process each key/value pair separately
Split deeply nested objects automatically

4. Handle Lists

If obj is a list:

Process each element individually
Good for arrays of objects returned by APIs

5. Handle Strings and Primitives

If a value is a long string but still under max_chars, it is kept as-is.