Developing an AI-Generated Encyclopedia while managing Scaling Challenges, Error handling, and Cost Reduction Strategies using OpenAI’s API

Seemingly overnight, AI has become the driving force behind almost every major app. Whether it’s Intercom’s support chatbot, Duolingo’s upcoming chatbot integration, or Zapier’s “connect a chatbot to anything” feature, AI is either already here or on its way.

Much like the invention of the calculator, we have a choice: we can either embrace this technology or risk being left behind.

In this article, we will develop an AI-generated Encyclopedia website. We won’t be manually inputting any data; instead, we will rely entirely on AI-generated information created through a procedural approach. As we progress through the article, we will gradually expand our prompt and then address important aspects such as scaling, error handling, and cost reduction. Through this demonstration, we aim to showcase the capabilities of the API and how to overcome its shortcomings when developing a procedurally created application.

Prompting types

The magic behind interfacing with ChatGPT is prompting. While this article won’t delve deeply into the topic of prompting, it’s worth mentioning the three types of OpenAI prompts:

It’s important to note that API requests lack context. OpenAI/ChatGPT doesn’t “remember” previous requests, and there’s no inherent way to link requests together apart from manually including user prompts and assistant response prompts in the prompt array of your request.

For example:

//https://api.openai.com/v1/chat/completions
{
  "model": "gpt-3.5-turbo",
  "messages": [
    {"role": "user", "content":  "Say a random name"},
    {"role": "assistant", "content":  "Jason"},
    {"role": "user", "content":  "What name did you give me previously?"}
  ]
}
// Jason

Accurate Prompting and Invalid JSON

The first challenge we need to address is decoding JSON from a ChatGPT response. Even a small error like a tab instead of a space, an extra quote, or a misplaced comma can render the entire response useless.

Since ChatGPT is primarily designed for conversation and trained to engage in dialogue, we must overcome common response failures to achieve a high success rate of over 90% before arriving at our error handling/retry system that we will discuss later.

Let’s create our first simple prompt for our Encyclopedia app. The response can then be used to populate our hypothetical user interface seamlessly.

You are an encyclopedia. You will derive a topic from the user’s prompt, you will respond with a JSON object with the following fields: title, introduction, description, conclusion

//https://api.openai.com/v1/chat/completions
{
  "model": "gpt-3.5-turbo",
  "messages": [
    {"role": "system", "content":  "You are an encyclopedia. You will derive a topic from the user's prompt, you will respond with a JSON object with the following fields, title, introduction, description, conclusion"},
    {"role": "user", "content":  "Whale shark"}
  ]
}
{"response": "Here's a JSON object with information about Whale sharks:\n```\n{\n   \"introduction\": \"The whale shark (Rhincodon typus) is the largest shark and the largest fish in the world, reaching lengths of 40 feet or more. Despite its massive size, it is a filter feeder, feeding primarily on planktonic organisms.\", \n   \"description\": \"Whale sharks are found in warm waters of the world's oceans, including the Indian, Pacific, and Atlantic oceans. They have distinctive light-colored spots and stripes on their dark-colored skin. The whale shark's mouth can open up to 4 feet wide, allowing it to consume large amounts of plankton. They are known to migrate long distances, but not much is known about their breeding habits. Although they are a threatened species, they are not considered to be dangerous to humans.\", \n   \"conclusion\": \"Whale sharks are fascinating creatures that have captured the attention of many people around the world. Although they are not dangerous to humans, they are under threat from overfishing and habitat destruction. Conservation efforts are underway to protect this majestic species and ensure its survival for generations to come.\"\n}\n```"}

In the end, we obtain an almost valid JSON object, but it contains additional elements such as newline characters (\n), an introduction, the ``` code block keys, and sometimes even a conclusion. These elements are present because we are working with a chatbot trained to provide content in a human-readable format. While this format works well in a copy/paste setup within the ChatGPT UI, it poses challenges for automated generation, which is our goal in this case.

To address this issue, we require a more reliable strategy. Although we could try asking ChatGPT to avoid including newline characters or introductions, based on my experience, such attempts often result in lower success rates as they go against the fine-tuning of ChatGPT.

Instead, we should adopt a safer approach:

Assume ChatGPT will provide invalid JSON responses regardless of clever prompting, and defend against all possible scenarios.

By removing probability from the equation, we aim for a high success rate rather than settling for an average one. In this case, the solution is relatively straightforward. We can utilize regular expressions (Regex) to eliminate the invalid JSON characters from any response.

After thorough experimentation, I have found that the following three Regex patterns are sufficient for our purposes. It’s important to run them in the following order and avoid combining them:

  1. \{[\s\S]*\}: matches the potential introduction or conclusion

  2. \\n|[^\x20-\x7e] : Removes all invalid JSON characters and \n

  3. ",\\s*\\}" : Removes a trailing comma before the closing bracket (a pattern that chatGPT likes to do sometimes

  4. strings.ReplaceAll(content, "\\\", "\""): ChatGPT has a habit of triple escaping a JSON quote so that it appears in the chat UI as escaped

With the application of the Regex cycle to the response, we can now concentrate our prompts solely on content and less on formatting. Once the response has undergone the Regex process, we obtain a valid JSON object that is ready for decoding and further utilization.

{
 \"title\": \"Whale Shark (Rincodon Typus)\",
 \"introduction\": \"The whale shark (Rhincodon typus) is the largest shark and the largest fish in the world, reaching lengths of 40 feet or more. Despite its massive size, it is a filter feeder, feeding primarily on planktonic organisms.\",
 \"description\": \"Whale sharks are found in warm waters of the world's oceans, including the Indian, Pacific, and Atlantic oceans. They have distinctive light-colored spots and stripes on their dark-colored skin. The whale shark's mouth can open up to 4 feet wide, allowing it to consume large amounts of plankton. They are known to migrate long distances, but not much is known about their breeding habits. Although they are a threatened species, they are not considered to be dangerous to humans.\",
 \"conclusion\": \"Whale sharks are fascinating creatures that have captured the attention of many people around the world. Although they are not dangerous to humans, they are under threat from overfishing and habitat destruction. Conservation efforts are underway to protect this majestic species and ensure its survival for generations to come.\"
}

This approach is applicable to any prompt, regardless of the specific data or app requirements. It’s remarkably straightforward, making it virtually inexcusable for any application not to leverage AI-generated content.

Token Limitations

OpenAI measures API usage based on tokens. Tokens are billed per 1,000 tokens used, and each model has a specific token limit. In the case of ChatGPT3.5, the token limit for a single request is set at 4,096 tokens. This count encompasses the system prompt, user prompt, any previous contextual responses (as assistant prompts), as well as the resulting content.

In practice, when developing an actual application, it is highly likely that you will quickly surpass the 4,096 token limit. Unfortunately, the OpenAI API does not handle such cases gracefully. If a request exceeds the token limit, the API response will include: "finish_reason":”length"indicating that the request ran out of tokens (more on this in the error handling section).

Consequently, you will receive a partially completed JSON object that cannot be easily rectified through regular expression matching.

To overcome the token limitation, we have several options in our toolkit:

  1. Choose between content or context: If your application requires contextual history in the form of a conversation, it’s crucial to keep the requests and responses small. This allows you to provide several previous interactions as context while staying within the token limit.

  2. Set length limits: It’s essential to define a clear schema with specific length limits for all fields to control the size of the response. Always leave a buffer of 500–1000 tokens to ensure you don’t exceed the token limit.

  3. Scale requests outward: Structure your prompts in a sensible manner and perform them asynchronously. By separating prompts and handling them independently, you can effectively manage the token usage and avoid token overflow.


The Meat of it: Procedurally generated encyclopedia

Now that we know the basics, let’s expand on our previous example by providing a more concrete schema that includes upper limits on length to prevent token overflow. Additionally, we will try to maximize the query’s length. One of the great aspects of an online encyclopedia is the inclusion of clickable related content. So let’s add clickable sub-topics, dedicated sections, related links, citations, and more.

Since the schema becomes more complex with arrays and sub-objects, we will also provide an example. Examples are a powerful tool to guide ChatGPT in providing the desired format. However, be sure to specify that they are formatting examples to prevent ChatGPT from using the provided values as guidance. We won’t have enough tokens to spare to provide real values in the example. Every token counts at this stage.

Request One

You are an encyclopedia. You will derive a topic from the user’s prompt, you will respond with a JSON object with the following fields: “title”(<100 chars), “introduction”(<2000 chars) , “description”(<4000 chars), “conclusion”(<2000 chars),”citations”:(array of citations as strings), “sections”: (array of 5 JSON objects with the following keys: “sectionTitle”(<100 chars), “sectionDescription”(<4000 chars)). Encase any potential subtopics or keywords of interest with @ signs

formatting example; {“title”: “”, “introduction”:””, “description”:””, “conclusion”:””, citations: [“”], sections: [{“sectionTitle”: “”, “sectionDescription”: “”}]}

Request Two

You are an encyclopedia. You will derive a topic from the user’s prompt. You will respond with a JSON object with the following fields: “relatedTopics”(an array of 10 strings that relate to the provide topic), “externalLinks” (an array of strings with external links to learn more about the topic), “unrelatedTopics”(an array of 10 strings that describe random, unrelated topics)

formatting example; {“relatedTopics”:[“”], “unrelatedTopics”:[“”], “externalLinks”: [“”]}

//Request 1 (title, introduction, description, conclusion, citations, sections
{
  "model": "gpt-3.5-turbo",
  "messages": [
   {"role": "system", "content": "You are an encyclopedia. You will derive a topic from the user's prompt, you will respond with a JSON object with the following fields: \"title\"(<100 chars), \"introduction\"(<2000 chars), \"description\"(<4000 chars), \"conclusion\"(<2000 chars), \"citations\": (array of citations as strings), \"sections\": (array of 5 JSON objects with the following keys: \"sectionTitle\"(<100 chars), \"sectionDescription\"(<4000 chars)). Encase any potential subtopics or keywords of interest with @ signs. Formatting example: {\"title\": \"\", \"introduction\": \"\", \"description\": \"\", \"conclusion\": \"\", \"citations\": [\"\"], \"sections\": [{\"sectionTitle\": \"\", \"sectionDescription\": \"\"}]}"},
   {"role": "user", "content":  "whale shark"}
 ]
}
//Request 2 (relatedTopics, unrelatedTopics, externalLinks)
{
  "model": "gpt-3.5-turbo",
  "messages": [
   {"role": "system", "content": "You are an encyclopedia. You will derive a topic from the user's prompt. You will respond with a JSON object with the following fields: \"relatedTopics\"(an array of 10 strings that relate to the provided topic), \"externalLinks\" (an array of strings with external links to learn more about the topic), \"unrelatedTopics\"(an array of 10 strings that describe random, unrelated topics) formatting example; {\"relatedTopics\":[\"\"], \"unrelatedTopics\":[\"\"], \"externalLinks\": [\"\"]}"},
   {"role": "user", "content":  "whale shark"}
 ]
}

With the response provided, we have successfully generated 80% of a Wikipedia page. It was that easy. Notice that we utilize the schema keys to provide information and meaning to the request (such as asking for subtopics).

The key to expanding this feature set lies in separating the content into distinct requests. By doing so, we can overcome token limitations and continue to grow the feature set as needed.

Multiple Generation Phases

In more advanced examples, there may be situations where you cannot generate all of your prompts simultaneously. For instance, one prompt might rely on the output of a previous prompt. In such cases, you can group your initial generation prompts together, process them, and if they succeed in generation and decoding, extract the minimum necessary data for context. It’s crucial to remain mindful of token limitations throughout the process. Let’s consider an example where we want to create a summary of all the sections. The second phase of generation might follow this structure:

//psuedocode
var contexualPrompt string
output.sections.forEach((item)=>{contextualPrompt += item.sectionTitle + ", "})
var prompt = "generate a summary about "+output.title+" related to the following concepts: " + contextualPrompt

Alternatively, if your total token count is well below the limit, you could also include the context as user/assistant prompts, which chatGPT will process more effectively. However if you are limited by the token limit, you can add the context more concisely following the example above.

By employing multiple generation phases, you can accomplish complex tasks that require interdependent prompts. Each phase builds upon the output of the previous one, allowing you to generate comprehensive and meaningful content.

Cost and Selecting a model

In the past, integrating Artificial Intelligence into any platform required substantial funding. However, the landscape has changed significantly, and now it may only require occasional scavenging of coins from under your couch — albeit with some important considerations.

OpenAI offers a variety of models designed for various tasks. While the ChatGPT3.5 chat completion model may not be the most obvious choice for returning JSON, as demonstrated in the previous chapters. However, it offers distinct advantages: it is cost-effective, fast, and highly optimized. It’s not just slightly cheaper, it’s cheaper by a factor of 10 versus Davinci, and even more versus ChatGPT4

In our previous example, the response from OpenAI included the following usage details:

"usage": {
  "prompt_tokens": 177,
  "completion_tokens": 1070,
  "total_tokens": 1247
 }

The total cost for this request amounts to $0.0024, which is equivalent to “two thousandths of a dollar.”

ChatGPT3.5, priced at $0.002 per 1,000 tokens, has a maximum token limit of 4,096 per request. Therefore, even if we hypothetically maximize a request to 4,096 tokens, the total cost would still be just $0.008.

A single request to ChatGPT3.5 can never exceed eight thousandths of a dollar

For this reason, it is always advisable to choose ChatGPT3.5 unless you have a specific reason to use another model.

Let’s compare the potential total costs for 4,096 tokens in each model:

Despite the hype around it, ChatGPT4 offers only minor improvements, the difference is mostly unnoticeable, unless the task involves complex reasoning. Even OpenAI’s documentation states:

For many basic tasks, the difference between GPT-4 and GPT-3.5 models is not significant. However, in more complex reasoning situations, GPT-4 is much more capable than any of our previous models.

Therefore, it is recommended to refrain from using ChatGPT4 unless absolutely necessary. The extra expense is typically unjustified and it also comes with degraded stability and significantly slower response times.

The same applies to Davinci unless you are specifically using it for TrainingGPT. Choosing Davinci for regular usage will lead to a skyrocketing cost per customer by over a factor of 10, and that doesn’t even include the training cost.

Error Handling and Logging

Procedurally generating a website by pulling JSON snippets from a Chat Completion model is not an exact science. There is a significant amount of variability in the types of responses, which can change over time. When developing a client-facing application, anything less than a 100% success rate on user interaction is simply unacceptable. To account for failures, we need a robust error handling and logging system that connects to a retry system, shielding the user from these issues.

Why Do Responses Fail?

To handle the various error scenarios, implement a robust retry system with exponential backoff. This system should retry each request several times, using a content length decrement approach for larger queries. In case of further failure, the entire phase should be retried with exponential backoff several times. Each non-API error should be logged and analyzed by a human until they no longer occur. A retry limit must be provided to avoid infinite loops

Final Tips…

Finally, there are some small things to look out for:

Conclusion

The AI revolution is transforming the landscape of our favorite apps. In this article, we developed an AI-generated Encyclopedia website, without a lot of effort. By relying on AI-generated content, we explored scaling, error handling, and cost reduction strategies. We have demonstrated the immense potential of the API and how to overcome its limitations. What you’ve learned here can be expanded into any application or any idea, where you can focus on the user experience and auto-generate the content.

At this point, we are only limited by our imagination.