sia.hackernoon.com

Developing an AI-Generated Encyclopedia while managing Scaling Challenges, Error handling, and Cost Reduction Strategies using OpenAI’s API

Seemingly overnight, AI has become the driving force behind almost every major app. Whether it’s Intercom’s support chatbot, Duolingo’s upcoming chatbot integration, or Zapier’s “connect a chatbot to anything” feature, AI is either already here or on its way.

Much like the invention of the calculator, we have a choice: we can either embrace this technology or risk being left behind.

In this article, we will develop an AI-generated Encyclopedia website. We won’t be manually inputting any data; instead, we will rely entirely on AI-generated information created through a procedural approach. As we progress through the article, we will gradually expand our prompt and then address important aspects such as scaling, error handling, and cost reduction. Through this demonstration, we aim to showcase the capabilities of the API and how to overcome its shortcomings when developing a procedurally created application.

Prompting types

The magic behind interfacing with ChatGPT is prompting. While this article won’t delve deeply into the topic of prompting, it’s worth mentioning the three types of OpenAI prompts:

User Prompts: These prompts are the ones provided by the user. In the ChatGPT UI, they would correspond to the questions or input entered by the user.
Assistant Prompts: These prompts refer to the historical responses generated by the AI. In the ChatGPT UI, they represent the previous replies from ChatGPT. The ChatGPT UI displays the contextual chat history by including both the original prompts and all the assistant prompts. It’s important to note that ChatGPT’s ability to remember more than a few contextual messages is limited by token limits, which we will discuss later.
System Prompts: This is where the real magic happens. System prompts enable us to “train” the AI on how to respond to user prompts. While not visible in the ChatGPT UI, system prompts play a crucial role in developing a useful integration because all responses generated by the AI will adhere to the guidelines set by the system prompt.

It’s important to note that API requests lack context. OpenAI/ChatGPT doesn’t “remember” previous requests, and there’s no inherent way to link requests together apart from manually including user prompts and assistant response prompts in the prompt array of your request.

For example:

//https://api.openai.com/v1/chat/completions
{
  "model": "gpt-3.5-turbo",
  "messages": [
    {"role": "user", "content":  "Say a random name"},
    {"role": "assistant", "content":  "Jason"},
    {"role": "user", "content":  "What name did you give me previously?"}
  ]
}
// Jason

Accurate Prompting and Invalid JSON

The first challenge we need to address is decoding JSON from a ChatGPT response. Even a small error like a tab instead of a space, an extra quote, or a misplaced comma can render the entire response useless.

Since ChatGPT is primarily designed for conversation and trained to engage in dialogue, we must overcome common response failures to achieve a high success rate of over 90% before arriving at our error handling/retry system that we will discuss later.

Let’s create our first simple prompt for our Encyclopedia app. The response can then be used to populate our hypothetical user interface seamlessly.

You are an encyclopedia. You will derive a topic from the user’s prompt, you will respond with a JSON object with the following fields: title, introduction, description, conclusion

//https://api.openai.com/v1/chat/completions
{
  "model": "gpt-3.5-turbo",
  "messages": [
    {"role": "system", "content":  "You are an encyclopedia. You will derive a topic from the user's prompt, you will respond with a JSON object with the following fields, title, introduction, description, conclusion"},
    {"role": "user", "content":  "Whale shark"}
  ]
}

{"response": "Here's a JSON object with information about Whale sharks:\n```\n{\n   \"introduction\": \"The whale shark (Rhincodon typus) is the largest shark and the largest fish in the world, reaching lengths of 40 feet or more. Despite its massive size, it is a filter feeder, feeding primarily on planktonic organisms.\", \n   \"description\": \"Whale sharks are found in warm waters of the world's oceans, including the Indian, Pacific, and Atlantic oceans. They have distinctive light-colored spots and stripes on their dark-colored skin. The whale shark's mouth can open up to 4 feet wide, allowing it to consume large amounts of plankton. They are known to migrate long distances, but not much is known about their breeding habits. Although they are a threatened species, they are not considered to be dangerous to humans.\", \n   \"conclusion\": \"Whale sharks are fascinating creatures that have captured the attention of many people around the world. Although they are not dangerous to humans, they are under threat from overfishing and habitat destruction. Conservation efforts are underway to protect this majestic species and ensure its survival for generations to come.\"\n}\n```"}

In the end, we obtain an almost valid JSON object, but it contains additional elements such as newline characters (\n), an introduction, the ``` code block keys, and sometimes even a conclusion. These elements are present because we are working with a chatbot trained to provide content in a human-readable format. While this format works well in a copy/paste setup within the ChatGPT UI, it poses challenges for automated generation, which is our goal in this case.

To address this issue, we require a more reliable strategy. Although we could try asking ChatGPT to avoid including newline characters or introductions, based on my experience, such attempts often result in lower success rates as they go against the fine-tuning of ChatGPT.

Instead, we should adopt a safer approach:

Assume ChatGPT will provide invalid JSON responses regardless of clever prompting, and defend against all possible scenarios.

By removing probability from the equation, we aim for a high success rate rather than settling for an average one. In this case, the solution is relatively straightforward. We can utilize regular expressions (Regex) to eliminate the invalid JSON characters from any response.

After thorough experimentation, I have found that the following three Regex patterns are sufficient for our purposes. It’s important to run them in the following order and avoid combining them:

\{[\s\S]*\}: matches the potential introduction or conclusion
\\n|[^\x20-\x7e] : Removes all invalid JSON characters and \n
",\\s*\\}" : Removes a trailing comma before the closing bracket (a pattern that chatGPT likes to do sometimes
strings.ReplaceAll(content, "\\\", "\""): ChatGPT has a habit of triple escaping a JSON quote so that it appears in the chat UI as escaped

With the application of the Regex cycle to the response, we can now concentrate our prompts solely on content and less on formatting. Once the response has undergone the Regex process, we obtain a valid JSON object that is ready for decoding and further utilization.

{
 \"title\": \"Whale Shark (Rincodon Typus)\",
 \"introduction\": \"The whale shark (Rhincodon typus) is the largest shark and the largest fish in the world, reaching lengths of 40 feet or more. Despite its massive size, it is a filter feeder, feeding primarily on planktonic organisms.\",
 \"description\": \"Whale sharks are found in warm waters of the world's oceans, including the Indian, Pacific, and Atlantic oceans. They have distinctive light-colored spots and stripes on their dark-colored skin. The whale shark's mouth can open up to 4 feet wide, allowing it to consume large amounts of plankton. They are known to migrate long distances, but not much is known about their breeding habits. Although they are a threatened species, they are not considered to be dangerous to humans.\",
 \"conclusion\": \"Whale sharks are fascinating creatures that have captured the attention of many people around the world. Although they are not dangerous to humans, they are under threat from overfishing and habitat destruction. Conservation efforts are underway to protect this majestic species and ensure its survival for generations to come.\"
}

This approach is applicable to any prompt, regardless of the specific data or app requirements. It’s remarkably straightforward, making it virtually inexcusable for any application not to leverage AI-generated content.

Token Limitations

OpenAI measures API usage based on tokens. Tokens are billed per 1,000 tokens used, and each model has a specific token limit. In the case of ChatGPT3.5, the token limit for a single request is set at 4,096 tokens. This count encompasses the system prompt, user prompt, any previous contextual responses (as assistant prompts), as well as the resulting content.

In practice, when developing an actual application, it is highly likely that you will quickly surpass the 4,096 token limit. Unfortunately, the OpenAI API does not handle such cases gracefully. If a request exceeds the token limit, the API response will include: "finish_reason":”length"indicating that the request ran out of tokens (more on this in the error handling section).

Consequently, you will receive a partially completed JSON object that cannot be easily rectified through regular expression matching.

To overcome the token limitation, we have several options in our toolkit:

Choose between content or context: If your application requires contextual history in the form of a conversation, it’s crucial to keep the requests and responses small. This allows you to provide several previous interactions as context while staying within the token limit.
Set length limits: It’s essential to define a clear schema with specific length limits for all fields to control the size of the response. Always leave a buffer of 500–1000 tokens to ensure you don’t exceed the token limit.
Scale requests outward: Structure your prompts in a sensible manner and perform them asynchronously. By separating prompts and handling them independently, you can effectively manage the token usage and avoid token overflow.

The Meat of it: Procedurally generated encyclopedia

Now that we know the basics, let’s expand on our previous example by providing a more concrete schema that includes upper limits on length to prevent token overflow. Additionally, we will try to maximize the query’s length. One of the great aspects of an online encyclopedia is the inclusion of clickable related content. So let’s add clickable sub-topics, dedicated sections, related links, citations, and more.

Since the schema becomes more complex with arrays and sub-objects, we will also provide an example. Examples are a powerful tool to guide ChatGPT in providing the desired format. However, be sure to specify that they are formatting examples to prevent ChatGPT from using the provided values as guidance. We won’t have enough tokens to spare to provide real values in the example. Every token counts at this stage.

Request One

You are an encyclopedia. You will derive a topic from the user’s prompt, you will respond with a JSON object with the following fields: “title”(<100 chars), “introduction”(<2000 chars) , “description”(<4000 chars), “conclusion”(<2000 chars),”citations”:(array of citations as strings), “sections”: (array of 5 JSON objects with the following keys: “sectionTitle”(<100 chars), “sectionDescription”(<4000 chars)). Encase any potential subtopics or keywords of interest with @ signs

formatting example; {“title”: “”, “introduction”:””, “description”:””, “conclusion”:””, citations: [“”], sections: [{“sectionTitle”: “”, “sectionDescription”: “”}]}

Request Two

You are an encyclopedia. You will derive a topic from the user’s prompt. You will respond with a JSON object with the following fields: “relatedTopics”(an array of 10 strings that relate to the provide topic), “externalLinks” (an array of strings with external links to learn more about the topic), “unrelatedTopics”(an array of 10 strings that describe random, unrelated topics)

formatting example; {“relatedTopics”:[“”], “unrelatedTopics”:[“”], “externalLinks”: [“”]}

//Request 1 (title, introduction, description, conclusion, citations, sections
{
  "model": "gpt-3.5-turbo",
  "messages": [
   {"role": "system", "content": "You are an encyclopedia. You will derive a topic from the user's prompt, you will respond with a JSON object with the following fields: \"title\"(<100 chars), \"introduction\"(<2000 chars), \"description\"(<4000 chars), \"conclusion\"(<2000 chars), \"citations\": (array of citations as strings), \"sections\": (array of 5 JSON objects with the following keys: \"sectionTitle\"(<100 chars), \"sectionDescription\"(<4000 chars)). Encase any potential subtopics or keywords of interest with @ signs. Formatting example: {\"title\": \"\", \"introduction\": \"\", \"description\": \"\", \"conclusion\": \"\", \"citations\": [\"\"], \"sections\": [{\"sectionTitle\": \"\", \"sectionDescription\": \"\"}]}"},
   {"role": "user", "content":  "whale shark"}
 ]
}

//Request 2 (relatedTopics, unrelatedTopics, externalLinks)
{
  "model": "gpt-3.5-turbo",
  "messages": [
   {"role": "system", "content": "You are an encyclopedia. You will derive a topic from the user's prompt. You will respond with a JSON object with the following fields: \"relatedTopics\"(an array of 10 strings that relate to the provided topic), \"externalLinks\" (an array of strings with external links to learn more about the topic), \"unrelatedTopics\"(an array of 10 strings that describe random, unrelated topics) formatting example; {\"relatedTopics\":[\"\"], \"unrelatedTopics\":[\"\"], \"externalLinks\": [\"\"]}"},
   {"role": "user", "content":  "whale shark"}
 ]
}

With the response provided, we have successfully generated 80% of a Wikipedia page. It was that easy. Notice that we utilize the schema keys to provide information and meaning to the request (such as asking for subtopics).

The key to expanding this feature set lies in separating the content into distinct requests. By doing so, we can overcome token limitations and continue to grow the feature set as needed.

Multiple Generation Phases

In more advanced examples, there may be situations where you cannot generate all of your prompts simultaneously. For instance, one prompt might rely on the output of a previous prompt. In such cases, you can group your initial generation prompts together, process them, and if they succeed in generation and decoding, extract the minimum necessary data for context. It’s crucial to remain mindful of token limitations throughout the process. Let’s consider an example where we want to create a summary of all the sections. The second phase of generation might follow this structure:

//psuedocode
var contexualPrompt string
output.sections.forEach((item)=>{contextualPrompt += item.sectionTitle + ", "})
var prompt = "generate a summary about "+output.title+" related to the following concepts: " + contextualPrompt

Alternatively, if your total token count is well below the limit, you could also include the context as user/assistant prompts, which chatGPT will process more effectively. However if you are limited by the token limit, you can add the context more concisely following the example above.

By employing multiple generation phases, you can accomplish complex tasks that require interdependent prompts. Each phase builds upon the output of the previous one, allowing you to generate comprehensive and meaningful content.

Cost and Selecting a model

In the past, integrating Artificial Intelligence into any platform required substantial funding. However, the landscape has changed significantly, and now it may only require occasional scavenging of coins from under your couch — albeit with some important considerations.

OpenAI offers a variety of models designed for various tasks. While the ChatGPT3.5 chat completion model may not be the most obvious choice for returning JSON, as demonstrated in the previous chapters. However, it offers distinct advantages: it is cost-effective, fast, and highly optimized. It’s not just slightly cheaper, it’s cheaper by a factor of 10 versus Davinci, and even more versus ChatGPT4

In our previous example, the response from OpenAI included the following usage details:

"usage": {
  "prompt_tokens": 177,
  "completion_tokens": 1070,
  "total_tokens": 1247
 }

The total cost for this request amounts to $0.0024, which is equivalent to “two thousandths of a dollar.”

ChatGPT3.5, priced at $0.002 per 1,000 tokens, has a maximum token limit of 4,096 per request. Therefore, even if we hypothetically maximize a request to 4,096 tokens, the total cost would still be just $0.008.

A single request to ChatGPT3.5 can never exceed eight thousandths of a dollar

For this reason, it is always advisable to choose ChatGPT3.5 unless you have a specific reason to use another model.

Let’s compare the potential total costs for 4,096 tokens in each model:

ChatGPT3.5: $0.008
ChatGPT4: $0.25
Davinci: $0.08

Despite the hype around it, ChatGPT4 offers only minor improvements, the difference is mostly unnoticeable, unless the task involves complex reasoning. Even OpenAI’s documentation states:

For many basic tasks, the difference between GPT-4 and GPT-3.5 models is not significant. However, in more complex reasoning situations, GPT-4 is much more capable than any of our previous models.

Therefore, it is recommended to refrain from using ChatGPT4 unless absolutely necessary. The extra expense is typically unjustified and it also comes with degraded stability and significantly slower response times.

The same applies to Davinci unless you are specifically using it for TrainingGPT. Choosing Davinci for regular usage will lead to a skyrocketing cost per customer by over a factor of 10, and that doesn’t even include the training cost.

Error Handling and Logging

Procedurally generating a website by pulling JSON snippets from a Chat Completion model is not an exact science. There is a significant amount of variability in the types of responses, which can change over time. When developing a client-facing application, anything less than a 100% success rate on user interaction is simply unacceptable. To account for failures, we need a robust error handling and logging system that connects to a retry system, shielding the user from these issues.

Why Do Responses Fail?

API Errors: These errors occur on the OpenAI side and may not always be under your control. While some errors require administrative action, such as billing errors, most are common HTTP status codes like 500 or 429 (indicating model overload rather than rate limiting). These errors can often be resolved by retrying the request. Since OpenAI is dealing with its own scaling issues, such errors are relatively common.

Rate Limiting: Rate limiting errors are also API errors and manifest as a 429 status code. They indicate that you are exceeding the rate limits set by OpenAI. When encountering rate limiting errors, retry the request, preferably with exponential backoff. Familiarize yourself with rate limits to ensure compliance.

Finish Reason: Once you have confirmed that no API error was returned, you can decode the response. You will see a finish_reason key in the response object. It should always equal stop indicating a successful request. If the finish_reason is content_filter refer to the Prompt Error section below.

Finally, if the finish_reason is length it means that the request+response surpassed 4096 tokens, and will cause this design pattern to fail. For that reason, and to be safe, you should never surpass ~3200 tokens since responses fluctuate. To defend against this, make your prompts adjustable. For example in the Encyclopedia example, we generate 5 custom sections with content for each topic. Each retry should decrement how many sections are produced by 1 until it is successful. You could also decrement the number of characters that are produced in each key of the schema for each retry. Make sure to have clear logging to fix or split your prompts properly to ensure that this will not continue. The retry/decrement system should be used as a backup and last resort only.

Response Formatting Errors: As OpenAI updates its model or as you fine-tune your prompt, there may be occasions where responses are returned in a broken format. When encountering such errors, comprehensive logging and saving of the raw output are essential to inform your team. This allows for review and prompt adjustment or the addition of extra regex layers. Since these errors will likely only occur some of the time, you can retry with a high likelihood of success as long as you fix the underlying problem moving forward.

Prompt Errors: Users may misuse an input prompt, leading to errors. In such cases, it is not advisable to retry the request. Instead, return an error message to the user, prompting them to fix their input. Detecting prompt errors programmatically can be challenging. ChatGPT does not provide an explicit error, finish_reason, or status code. However, prompt errors often start with phrases like I'm sorry , I apologize ,Based on the prompt and based on your prompt. Checking for these prefixes in requests is a good way to identify prompt errors. Additional variations may exist depending on your specific prompt. If you have alternative techniques for detecting prompt errors, please share them in the comments.

To handle the various error scenarios, implement a robust retry system with exponential backoff. This system should retry each request several times, using a content length decrement approach for larger queries. In case of further failure, the entire phase should be retried with exponential backoff several times. Each non-API error should be logged and analyzed by a human until they no longer occur. A retry limit must be provided to avoid infinite loops

Final Tips…

Finally, there are some small things to look out for:

Be wary of your punctuation

Tiny variations in prompt fine-tuning can have a significant impact on the results. For instance, I initially used a colon to separate the example: {} , resulting in unexpected behavior from ChatGPT. Specifically, the model started returning {“example":{...}}instead of the desired outcome.

This highlights the importance of paying close attention to the details when fine-tuning your prompts. Even minor changes in punctuation or formatting can alter the model’s understanding and response generation
Use a Date locked model:

ChatGPT offers date-locked models. Internal updates often require minor changes to the prompt or error handling. OpenAI gives no warning of internal updates in the API. So to avoid significant disruptions, OpenAI offers models tied to specific dates that remain unchanged, without receiving updates. These date-locked models are deprecated every three months. By utilizing date-locked models, you can properly plan, test and handle internal model updates every 3 months versus on random occasions. Currently, the 3.5 model is gpt-3.5-turbo-0301 which stopped receiving updates on March 1st.
Control the Temperature

ChatGPT requests offer the option to include a temperature attribute in the request body. This attribute takes a value ranging from 0 to 2 and controls the level of determinism or randomness in the generated response. A higher temperature value encourages more creative and diverse content, while a lower value promotes deterministic content that remains consistent across requests.

Considering our goal of creating an encyclopedia, where stability and consistency are key, we would opt for a lower temperature value, closer to 0. This ensures that the content doesn’t change excessively between requests, aligning with the desired nature of an encyclopedia. But in any case, when dealing with JSON format responses, keeping the temperature between 0 to 1 is advised to prevent ChatGPT from hallucinating outside of the schema limitations
Log Everything

Logging everything is essential when it comes to ensuring the reliability and accountability of your application. When faced with inquiries from customers or superiors regarding issues with the system, simply attributing it to the API will fall short. Instead, you should maintain a comprehensive log of every raw response from the API. This allows for thorough review and analysis, enabling you to identify areas that require improvement. Errors should only ever occur once and then your infrastructure should defend against it.

Conclusion

The AI revolution is transforming the landscape of our favorite apps. In this article, we developed an AI-generated Encyclopedia website, without a lot of effort. By relying on AI-generated content, we explored scaling, error handling, and cost reduction strategies. We have demonstrated the immense potential of the API and how to overcome its limitations. What you’ve learned here can be expanded into any application or any idea, where you can focus on the user experience and auto-generate the content.

At this point, we are only limited by our imagination.

How to Build an AI-Generated Procedural Application