We Cut Our OpenAI Costs by 50% Without Changing the Model

isuru mahesh perera

When we launched our AI chatbot product, it worked beautifully.

Great responses. Happy users. Strong engagement.

But there was one problem: That’s our Cost.

On average, our OpenAI cost per user was around $5 per month.

At first, that didn’t feel scary, But then we did the math.

If we scaled to 1,000 users, that’s: $5,000 per month

LLM pricing depends on the number of tokens and the number of requests. We couldn’t reduce requests but we could reduce tokens.

And that’s where TOON came in.

What Is TOON?
TOON (Token-Oriented Object Notation) is a compact, human-readable encoding of the JSON data model designed specifically for LLM prompts.

https://toonformat.dev

Think of it as a translation layer:

Use JSON programmatically in your backend.
Convert it to TOON before sending it to the LLM.
After get the response, convert it back to json.
TOON combines:

YAML-style indentation for nesting
CSV-style tabular format for uniform arrays
The Real Problem: JSON Is Token-Expensive
Imagine our chatbot needs to send product data to the model:


[
  {
    "id": 201,
    "title": "Wireless Bluetooth Headphones",
    "seoTitle": "Best Wireless Bluetooth Headphones 2025",
    "seoDescription": "Premium noise-cancelling wireless headphones with 30-hour battery life.",
    "tags": ["electronics", "audio", "wireless"],
    "price": 129.99,
    "currency": "USD",
    "active": true
  },
  {
    "id": 202,
    "title": "Ergonomic Office Chair",
    "seoTitle": "Comfortable Ergonomic Office Chair for Long Hours",
    "seoDescription": "Adjustable lumbar support chair designed for productivity and comfort.",
    "tags": ["furniture", "office", "comfort"],
    "price": 249.00,
    "currency": "USD",
    "active": true
  }
]

At first glance, this looks normal.

But look carefully.

For every product, we repeat:

"id"
"title"
"seoTitle"
"seoDescription"
"tags"
"price"
"currency"
"active"

That’s 8 keys repeated for every single row.

If you have 100 products, that’s 800 repeated field names.

LLMs don’t need that repetition.

They just need structure.

The Same Data in TOON

products[2]{id,title,seoTitle,seoDescription,tags,price,currency,active}:
  201,"Wireless Bluetooth Headphones","Best Wireless Bluetooth Headphones 2025","Premium noise-cancelling wireless headphones with 30-hour battery life.","electronics|audio|wireless",129.99,USD,true
  202,"Ergonomic Office Chair","Comfortable Ergonomic Office Chair for Long Hours","Adjustable lumbar support chair designed for productivity and comfort.","furniture|office|comfort",249.00,USD,true

What changed ?

Field names declared once
No repeated keys
No curly braces per row
No structural duplication
Tags flattened in a compact way
Same information & Much fewer tokens.

Our Case
In our production workload, we use Python and integrating TOON was straightforward.

We only modified the translation layer:

Convert JSON → TOON before sending to the LLM
Convert TOON → JSON after receiving the response
That was it.

By changing only the input and output formatting layer, we reduced our OpenAI costs by nearly 50%.

Based on our experience, TOON works best when:

Your data is flattened or mostly flat
You’re sending lists of similar objects
Your structure is simple or semi-structured
You’re not feeding deeply nested hierarchies
In our case, we weren’t sending complex nested objects to the model.
Most of our chatbot context consisted of structured records: product data, analytics summaries, event logs which fit perfectly into TOON’s tabular format.

If your data looks like:

[
 { "id": "", "title": "", "price": "" },
 { "id": "", "title": "", "price": "" },
]

You’re likely paying for repeated keys you don’t actually need.

Final Thought
If you’re building with LLMs, take a closer look at your token usage.

Before changing models or reducing features, inspect the structure of the data you’re sending.

In many cases, the waste isn’t in the intelligence — it’s in the formatting.

We only changed our translation layer.

Nothing else.

And that alone reduced our OpenAI costs by nearly 50%.

Try TOON in your own workload.
Measure the difference.
Run the numbers.

You may find that scaling becomes much more affordable than you expected.

We Cut Our OpenAI Costs by 50% Without Changing the Model

4
Online

7.1k
Users

3.0k
Topics

7.0k
Posts

We Cut Our OpenAI Costs by 50% Without Changing the Model

4Online

7.1kUsers

3.0kTopics

7.0kPosts

4
Online

7.1k
Users

3.0k
Topics

7.0k
Posts