sia.hackernoon.com

The lead image for this article was generated by HackerNoon's AI Image Generator via the prompt "a robot using an old desktop computer"

There's something new in the AI space. In this post, I'll walk you through the process of installing and setting up PrivateGPT.

What is PrivateGPT?

A powerful tool that allows you to query documents locally without the need for an internet connection. Whether you're a researcher, dev, or just curious about exploring document querying tools, PrivateGPT provides an efficient and secure solution. This tutorial accompanies a Youtube video, where you can find a step-by-step demonstration of the installation process!

https://www.youtube.com/watch?v=ZHrdCKqirKM&embedable=true

Prerequisites:

Python 3.10 or later installed on your system or virtual env
Basic knowledge of using the command line Interface (CLI/Terminal)
Git installed

First, let's create a virtual environment. You can create a folder on your desktop. In the screenshot below you can see I created a folder called 'blog_projects'. Open the command line from that folder or navigate to that folder using the terminal/ Command Line. Follow the steps below to create a virtual environment.

1. Create a virtual environment:

Open your terminal and navigate to the desired directory.
Run the following command to create a virtual environment (replace myenv with your preferred name):

python3 -m venv myenv

The name of your virtual environment will be 'myenv'

2. Activate the virtual environment:

On macOS and Linux, use the following command:

source myenv/bin/activate

On Windows, use the following command:

myenv\Scripts\activate

3. Run the git clone command to clone the repository:

git clone https://github.com/imartinez/privateGPT.git

By creating and activating the virtual environment before cloning the repository, we ensure that the project dependencies will be installed and managed within this environment. This helps maintain a clean and isolated development environment specific to this project.

After cloning the repository, you can proceed to install the project dependencies and start working on the project within the activated virtual environment.

Then copy the code repo from Github, and go into your directory or folder where you want your project to live. Open the terminal or navigate to your folder from the command line.

Once everything loads, you can run the install requirements command to install the needed dependencies.

Navigate to the directory where you want to install PrivateGPT.

CD <FOLDER NAME>

Run the following command to install the required dependencies:

pip install -r requirements.txt

Next, download the LLM model and place it in a directory of your choice. The default model is 'ggml-gpt4all-j-v1.3-groovy.bin,' but if you prefer a different GPT4All-J compatible model, you can download it and reference it in your .env file.

Rename the 'example.env' file to '.env' and edit the variables appropriately.

Set the 'MODEL_TYPE' variable to either 'LlamaCpp' or 'GPT4All,' depending on the model you're using.

Set the 'PERSIST_DIRECTORY' variable to the folder where you want your vector store to be stored.
Set the 'MODEL_PATH' variable to the path of your GPT4All or LlamaCpp supported LLM model.
Set the 'MODEL_N_CTX' variable to the maximum token limit for the LLM model.
Set the 'EMBEDDINGS_MODEL_NAME' variable to the SentenceTransformers embeddings model name (refer to https://www.sbert.net/docs/pretrained_models.html).

Make sure you create a models folder in your project to place the model you downloaded.

PrivateGPT comes with a sample dataset that uses a 'state of the union transcript' as an example. However, you can also ingest your own dataset. Let me show you how.

Put all your files into the 'source_documents' directory.
Make sure your files have one of the supported extensions: CSV, Word Document (docx, doc), EverNote (enex), Email (eml), EPub (epub), HTML File (html), Markdown (md), Outlook Message (msg), Open Document Text (odt), Portable Document Format (PDF), PowerPoint Document (pptx, ppt), Text file (txt).
Run the following command to ingest all the data:

python ingest.py

Perfect! The data ingestion process is complete. Now, let's move on to the next step!

If you have this error: cannot import name 'DEFAULT_CIPHERS' from 'urllib3.util.ssl_' , use this command: python -m pip install requests "urllib3<2"

Key thing to mention, IF YOU ADD NEW DOCUMENTS TO YOUR SOURCE_DOCS you need to rerun ‘python ingest.py’

---------------------------------------------------------------

Asking Questions to Your Documents Host:

Now comes the exciting part—asking questions to your documents using PrivateGPT. Let me show you how it's done.

Open your terminal or command prompt.
Navigate to the directory where you installed PrivateGPT.

[ project directory 'privateGPT' , if you type ls in your CLI you will see the READ.ME file, among a few files.]

Run the following command:

python privateGPT.py

Wait for the script to prompt you for input.
When prompted, enter your question!

Tricks and tips:

Use python privategpt.py -s [ to remove the sources from your output. So instead of displaying the answer and the source it will only display the source ]
On line 33, at the end of the command where you see’ verbose=false, ‘ enter ‘n threads=16’ which will use more power to generate text at a faster rate!

PrivateGPT Final Thoughts

This is great for anyone who wants to understand complex documents on their local computer.
This is great for private data you don't want to leak out externally.
This is particularly great for students, people new to an industry, anyone learning about taxes, or anyone learning anything complicated that they need help understanding.
However, the wait time can be 30-50 seconds or maybe even longer because you’re running it on your local computer.

How to Install PrivateGPT: A Local ChatGPT-Like Instance with No Internet Required