This is the breakout year for Generative AI!

Yes, this article’s been written by a violinist and there’s new content! Spectacular new and awesome content!


Well; to say the very least, this year, I’ve been spoiled for choice as to how to run an LLM Model locally.

Let’s start!


1) HuggingFace Transformers


Hugging Face Transformers is a state-of-the-art machine learning library that provides easy access to a wide range of pre-trained models for Natural Language Processing (NLP), Computer Vision, Audio tasks, and more. It’s an open-source library developed by Hugging Face, a company that has built a strong community around machine learning and NLP.


Here are some key features of Hugging Face Transformers:


  1. Pre-trained Models: Transformers provides APIs and tools to easily download and train state-of-the-art pre-trained models. Using these models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch.
  2. Multimodal Support: These models support common tasks in different modalities, such as text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, text generation for NLP; image classification, object detection, and segmentation for Computer Vision; automatic speech recognition and audio classification for Audio; and table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering for Multimodal tasks.
  3. Framework Interoperability: Transformers support framework interoperability between PyTorch, TensorFlow, and JAX. This provides the flexibility to use a different framework at each stage of a model’s life; train a model in one framework, and load it for inference in another.
  4. Community and Collaboration: Hugging Face has a strong community focus, with a Model Hub that serves as a bustling hub where users exchange and discover thousands of models and datasets, fostering a culture of collective innovation in NLP.
  5. Ease of Use: The Transformers library simplifies the machine learning journey, offering developers an efficient pathway to download, train, and seamlessly integrate machine learning models into their workflows.
  6. Diverse Datasets: The Datasets library functions as a comprehensive toolbox, offering diverse datasets for developers to train and test language models effortlessly.

In summary, Hugging Face Transformers is a powerful tool that makes the complexities of language technology and machine learning accessible to everyone, from beginners to experts.


2) gpt4all


GPT4All is an open-source ecosystem developed by Nomic AI that allows you to run powerful and customized large language models (LLMs) locally on consumer-grade CPUs and any GPU. It aims to be the best instruction-tuned assistant-style language model that any person or enterprise can freely use, distribute, and build on.


Here are some key features of GPT4All:

  1. Locally Running: GPT4All runs locally on your machine, which means it doesn’t require an internet connection or a GPU. This makes it a privacy-aware chatbot.
  2. Free-to-Use: It’s free to use, which means you don’t have to pay for a platform or hardware subscription.
  3. Customized Large Language Models: GPT4All allows you to train and deploy powerful and customized large language models.
  4. Various Capabilities: GPT4All can answer questions about the world, assist in writing emails, documents, creative stories, poems, songs, and plays, understand documents, and provide guidance on easy coding tasks.
  5. Open-Source Ecosystem: GPT4All is part of an open-source ecosystem, which means you can contribute to its development and improvement.

In summary, GPT4All is a tool that democratizes access to AI resources by allowing anyone to run large language models locally on their own hardware.


The website is (unsurprisingly)

https://gpt4all.io

Like all the LLMs on this list (when configured correctly), gpt4all does not require Internet or a GPU.


3) ollama


Ollama is an open-source command line tool that allows you to run, create, and share large language models on your computer. It’s designed to run open-source large language models locally on your machine2. Ollama supports various models such as Llama 3, Mistral, Gemma, and others.


It simplifies the process by bundling model weights, configuration, and data into a single package defined by a Modelfile. This means you can run large language models, such as Llama 2 and Code Llama, without any registration or waiting list.


Ollama is available for macOS, Linux, and Windows. It started by supporting Llama, then expanded its model library to include models like Mistral and Phi-25. So, if you’re interested in running large language models on your local machine, Ollama could be a great tool to consider!


Ollama’s multimodal capabilities allow it to process both text and image inputs together. This means you can ask the model questions or give it prompts that involve both text and images, and it will generate responses based on both types of input.

To use this feature, you simply type your prompt and then drag and drop an image. There is a new images parameter for both Ollama’s Generate API & Chat API. The images parameter takes a list of base64 encoded PNG or JPEG format images. Ollama supports image sizes up to 100MB.


For example, you can run a multimodal model like LLaVA by typing ollama run llava in the terminal. In the background, Ollama will download the LLaVA 7B model and run it. If you want to use a different parameter size, you can try the 13B model using ollama run llava:13b.


More multimodal models are becoming available, such as BakLLaVA 7B. You can run it by typing ollama run bakllava in the terminal.

This multimodal capability makes Ollama a powerful tool for generating creative and contextually relevant responses based on a combination of text and image inputs. It’s a great way to leverage the power of large language models in a more interactive and engaging way.

https://ollama.com/?embedable=true


4. localllm


I find that this is the most convenient way of all. The full explanation is given on the link below:




localllm is a tool developed by Google Cloud Platform that allows you to run Large Language Models (LLMs) locally on Cloud Workstations. It’s particularly useful for developers who want to leverage the power of LLMs without the constraints of GPU availability.


Here are some key points about localllm:

In essence, localllm is a game-changer for developers seeking to leverage LLMs in their applications, offering a flexible and efficient approach to application development.


https://github.com/GoogleCloudPlatform/localllm?embedable=true


5. Llama 3 (Version 3 released from Meta)

Meta’s Llama 3, often referred to as Llama 3, is the latest iteration of Meta’s Large Language Model (LLM). It’s a powerful AI tool that has made significant strides in the field of Natural Language Processing (NLP) and Machine Learning (ML).


Llama 3 is an open-source large language model designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. It’s part of a foundational system and serves as a bedrock for innovation in the global community.


This model is available in both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications. It excels at understanding language nuances and performing complex tasks like translation and dialogue generation.


Llama 3 has been trained on over 15 trillion tokens of data, a training dataset 7 times larger than that used for Llama 2, including 4 times more code. This results in the most capable Llama model yet, which supports an 8K context length that doubles the capacity of Llama 2.


With the release of Llama 3, Meta has updated the Responsible Use Guide (RUG) to provide the most comprehensive information on responsible development with LLMs. Their system-centric approach includes updates to their trust and safety tools with Llama Guard 2, optimized to support the newly announced taxonomy published by MLCommons expanding its coverage to a more comprehensive set of safety categories, Code Shield, and Cybersec Eval 2.


In summary, Llama 3 represents a significant advancement in AI technology, offering enhanced capabilities and improved performance for a broad range of applications.

Link: https://huggingface.co/meta-llama


6. LLM


The llm Python script from PyPI is a command-line utility and Python library for interacting with Large Language Models (LLMs), including OpenAI, PaLM, and local models installed on your own machine.


Here’s a brief overview of its functionalities:

  1. Interacting with LLMs: You can run prompts from the command-line, store the results in SQLite, generate embeddings, and more.
  2. Support for Self-Hosted Language Models: The LLM CLI tool now supports self-hosted language models via plugins.
  3. Working with Embeddings: LLM provides tools for working with embeddings.
  4. Installation: You can install this tool using pip with the command pip install llm or using Homebrew with the command brew install llm.
  5. Getting Started: If you have an OpenAI API key, you can get started using the OpenAI models right away. As an alternative to OpenAI, you can install plugins to access models by other providers, including models that can be installed and run on your own device.
  6. Installing a Model Locally: LLM plugins can add support for alternative models, including models that run on your own machine. For example, to download and run Mistral 7B Instruct locally, you can install the llm-gpt4all plugin.
  7. Running a Prompt: Once you’ve saved a key, you can run a prompt like this: llm "Five cute names for a pet penguin".
  8. Chatting with a Model: You can also start a chat session with the model using the llm chat command.
  9. Using a System Prompt: You can use the -s/--system option to set a system prompt, providing instructions for processing other input to the tool.


https://github.com/simonw/llm?embedable=true


7. LlamaFile


A llamafile is an executable Large Language Model (LLM) that you can run on your own computer. It contains the weights for a given open LLM, as well as everything needed to actually run that model on your computer.


The goal of a llamafile is to make open LLMs much more accessible to both developers and end users. It combines the model with a framework into one single-file executable (called a “llamafile”) that runs locally on most computers, with no installation1. This means all the operations happen locally and no data ever leaves your computer.


For example, you can download an example llamafile for the LLaVA model, which is a new LLM that can do more than just chat; you can also upload images and ask it questions about them.


In addition to hosting a web UI chat server, when a llamafile is started, it also provides an OpenAI API compatible chat completions endpoint. This is designed to support the most common OpenAI API use cases, in a way that runs entirely locally.


This initiative is part of an effort to ensure that AI remains free and open, and to increase both trust and safety. It aims to lower barriers to entry, increase user choice, and put the technology directly into the hands of the people.


https://github.com/Mozilla-Ocho/llamafile?embedable=true


8. ChatGPT-Next-Web


ChatGPT-Next-Web, also known as NextChat, is an open-source application that provides a user interface for interacting with advanced language models like GPT-3.5, GPT-4, and Gemini-Pro. It’s built on Next.js, a popular JavaScript framework.


Here are some key features of ChatGPT-Next-Web:

Deployment: It can be deployed for free with one-click on Vercel in under 1 minute.

It’s a versatile tool that allows users to interact with AI models in a more personalized and controlled manner. It’s being used by developers and AI enthusiasts around the world to create and deploy their own ChatGPT applications.


Website: https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web


9. LocalAI



https://localai.io


10. gpt4free


https://gpt4free.io


11. PrivateGPT


PrivateGPT is an open-source, production-ready AI project that allows you to interact with your documents using the power of Large Language Models (LLMs), 100% privately.


Here are some key features of PrivateGPT:

PrivateGPT is a versatile tool that allows users to interact with AI models in a more personalized and controlled manner. It’s being used by developers and AI enthusiasts around the world to create and deploy their own AI applications.

https://github.com/zylon-ai/private-gpt?embedable=true


12. Text-Generation-WebUI


Text-Generation-WebUI, also known as TGW or “oobabooga”, is an open-source project that provides a Gradio-based web interface for interacting with Large Language Models.


Here are some key features of Text-Generation-WebUI:

https://github.com/oobabooga/text-generation-webui?embedable=true


13. H2O.ai


H2O.ai is a company that provides AI solutions and is known for democratizing AI. They offer a range of products and services, including:

H2O Open Source: This is a fully open source, distributed in-memory machine learning platform with linear scalability. It supports the most widely used statistical & machine learning algorithms including gradient boosted machines, generalized linear models, deep learning and more. It also has an industry leading AutoML functionality that automatically runs through all the algorithms and their hyperparameters to produce a leaderboard of the best models.

H2O.ai Cloud Platform: This is a cloud-based platform for creating and deploying generative AI solutions with customizable large language models (LLMs). It offers features like information retrieval on internal data, data extraction, summarization, and other batch processing tasks, as well as code/SQL generation for data analysis. It also provides a customizable evaluation and validation framework that is model agnostic.

Enterprise Support: When AI becomes mission critical for enterprise success, H2O.ai provides the services you need to optimize your investments in people and technology to deliver on your AI vision.

PrivateGPT: This is a product that allows you to interact with your documents using the power of Large Language Models (LLMs), 100% privately. It provides an API offering all the primitives required to build private, context-aware AI applications.


H2O.ai is used by over 18,000 organizations globally and is extremely popular in both the R & Python communities. It’s a powerful tool for those who want to run AI models locally without the need for internet connectivity or expensive hardware.


14. LightLLM


LightLLM is a Python-based Large Language Model (LLM) inference and serving framework. It’s known for its lightweight design, easy scalability, and high-speed performance.


Here are some key features of LightLLM:

LightLLM supports a wide range of models including BLOOM, LLaMA, LLaMA V2, StarCoder, Qwen-7b, ChatGLM2-6b, Baichuan-7b, Baichuan2-7b, Baichuan2-13b, Baichuan-13b, InternLM-7b, Yi-34b, Qwen-VL, Qwen-VL-Chat, Llava-7b, Llava-13b, Mixtral, Stablelm, MiniCPM.

https://github.com/ModelTC/lightllm?embedable=true


15. GPT Academic


GPT Academic, also known as gpt_academic, is an open-source project that provides a practical interaction interface for Large Language Models (LLMs) like GPT and GLM. It’s particularly optimized for academic reading, writing, and code explanation.


Here are some key features of GPT Academic:

https://academicgpt.net


In Conclusion

Now, let’s take a moment to appreciate the sheer genius of these LLMs. They’re like the Swiss Army knives of the AI world - versatile, handy, and always ready to impress with their multitude of uses. And the best part? They’re local! That’s right, no need to venture into the vast and sometimes scary world of the internet. These models are right at home on your local machine, ready to serve at a moment’s notice.


But let’s not forget the real heroes of our story - the programmers. Those tireless warriors of the digital realm, armed with nothing but their keyboards and an unquenchable thirst for knowledge. They’re the ones who’ve tamed these wild beasts of AI, turning lines of incomprehensible code into something we can all use and appreciate. So here’s to you, dear programmers. May your coffee be strong, your bugs few, and your StackOverflow answers always upvoted.


So, here’s to Local LLMs - the truly unsung heroes of the AI world. May they continue to inspire, amaze, and occasionally confuse us for many years to come. And remember, in the world of AI, the only limit is your imagination. So dream big, code hard, and don’t forget to laugh along the way. After all, as they say in the world of programming, “Laughter is the best exception handler.”


Cheers!



All Images Generated With Bing Image Creator. It’s Awesome! (DALL-E-3)