Lanka Developers Community

    Lanka Developers

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Shop

    We Cut Our OpenAI Costs by 50% Without Changing the Model

    Artificial Intelligence
    ai
    1
    1
    16
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • isuru mahesh perera
      isuru mahesh perera last edited by

      micheile-henderson-ZVprbBmT8QA-unsplash.jpg

      When we launched our AI chatbot product, it worked beautifully.

      Great responses. Happy users. Strong engagement.

      But there was one problem: That’s our Cost.

      On average, our OpenAI cost per user was around $5 per month.

      At first, that didn’t feel scary, But then we did the math.

      If we scaled to 1,000 users, that’s: $5,000 per month

      LLM pricing depends on the number of tokens and the number of requests. We couldn’t reduce requests but we could reduce tokens.

      And that’s where TOON came in.

      What Is TOON?
      TOON (Token-Oriented Object Notation) is a compact, human-readable encoding of the JSON data model designed specifically for LLM prompts.

      https://toonformat.dev

      Think of it as a translation layer:

      Use JSON programmatically in your backend.
      Convert it to TOON before sending it to the LLM.
      After get the response, convert it back to json.
      TOON combines:

      YAML-style indentation for nesting
      CSV-style tabular format for uniform arrays
      The Real Problem: JSON Is Token-Expensive
      Imagine our chatbot needs to send product data to the model:

      
      [
        {
          "id": 201,
          "title": "Wireless Bluetooth Headphones",
          "seoTitle": "Best Wireless Bluetooth Headphones 2025",
          "seoDescription": "Premium noise-cancelling wireless headphones with 30-hour battery life.",
          "tags": ["electronics", "audio", "wireless"],
          "price": 129.99,
          "currency": "USD",
          "active": true
        },
        {
          "id": 202,
          "title": "Ergonomic Office Chair",
          "seoTitle": "Comfortable Ergonomic Office Chair for Long Hours",
          "seoDescription": "Adjustable lumbar support chair designed for productivity and comfort.",
          "tags": ["furniture", "office", "comfort"],
          "price": 249.00,
          "currency": "USD",
          "active": true
        }
      ]
      

      At first glance, this looks normal.

      But look carefully.

      For every product, we repeat:

      "id"
      "title"
      "seoTitle"
      "seoDescription"
      "tags"
      "price"
      "currency"
      "active"
      

      That’s 8 keys repeated for every single row.

      If you have 100 products, that’s 800 repeated field names.

      LLMs don’t need that repetition.

      They just need structure.

      The Same Data in TOON

      products[2]{id,title,seoTitle,seoDescription,tags,price,currency,active}:
        201,"Wireless Bluetooth Headphones","Best Wireless Bluetooth Headphones 2025","Premium noise-cancelling wireless headphones with 30-hour battery life.","electronics|audio|wireless",129.99,USD,true
        202,"Ergonomic Office Chair","Comfortable Ergonomic Office Chair for Long Hours","Adjustable lumbar support chair designed for productivity and comfort.","furniture|office|comfort",249.00,USD,true
      

      What changed ?

      Field names declared once
      No repeated keys
      No curly braces per row
      No structural duplication
      Tags flattened in a compact way
      Same information & Much fewer tokens.

      Our Case
      In our production workload, we use Python and integrating TOON was straightforward.

      We only modified the translation layer:

      Convert JSON → TOON before sending to the LLM
      Convert TOON → JSON after receiving the response
      That was it.

      By changing only the input and output formatting layer, we reduced our OpenAI costs by nearly 50%.

      Based on our experience, TOON works best when:

      Your data is flattened or mostly flat
      You’re sending lists of similar objects
      Your structure is simple or semi-structured
      You’re not feeding deeply nested hierarchies
      In our case, we weren’t sending complex nested objects to the model.
      Most of our chatbot context consisted of structured records: product data, analytics summaries, event logs which fit perfectly into TOON’s tabular format.

      If your data looks like:

      [
       { "id": "", "title": "", "price": "" },
       { "id": "", "title": "", "price": "" },
      ]
      

      You’re likely paying for repeated keys you don’t actually need.

      Final Thought
      If you’re building with LLMs, take a closer look at your token usage.

      Before changing models or reducing features, inspect the structure of the data you’re sending.

      In many cases, the waste isn’t in the intelligence — it’s in the formatting.

      We only changed our translation layer.

      Nothing else.

      And that alone reduced our OpenAI costs by nearly 50%.

      Try TOON in your own workload.
      Measure the difference.
      Run the numbers.

      You may find that scaling becomes much more affordable than you expected.

      1 Reply Last reply Reply Quote 0
      • 1 / 1
      • First post
        Last post

      4
      Online

      7.1k
      Users

      3.0k
      Topics

      7.0k
      Posts

      • Privacy
      • Terms & Conditions
      • Donate

      © Copyrights and All right reserved Lanka Developers Community

      Powered by Axis Technologies (PVT) Ltd

      Made with in Sri Lanka

      | |