sia.hackernoon.com

16 Tools to Run Your Local LLMs With Privacy Support

Doing this article has become a yearly favorite, so I plan to add extra value by doing two editions this year.

All of the tools below are free, many are open-source, and there are a wide range of LLMs, SLMs, and LMMs out there.

For the uninitiated:

LLMs - Large Language Models that work only with text.
SLMs - Small Language Models that typically have less than 10B parameters.
LMMs - Large Multi-modal Models that work with text, images, audio, and video.

Use perplexity.ai to learn new terms.

I prefer Perplexity over Google for practically everything these days.

I might do a follow-up article about the best models to use with these tools.

There are 20 tools to run and interact with your local LLMs, all given below.

We will be taking a look at all these tools for Local LLMs.

Explore as many of them as you can.

All of them are useful in their own way.

And, finally: Enjoy!

16 Tools to Run LLMs Locally

1. H2O LLM Studio

Offers a no-code GUI for fine-tuning and deploying LLMs locally.
Includes H2OGPT, a user-friendly open-source LLM.
Focuses on enterprise-grade features with ease of use.
Provides server components for API access and local deployment.
This tool focuses on enterprise use cases and supports privacy for your data.
The focus is on SLMs and customizing for enterprises.
H20.ai is a complete toolkit for deploying LLMs.
It provides many additional tools -but their discussion would be out of scope for this article.
You can learn more about H2O.ai in the article below:
And you can find the tool at https://h2o.ai/products/h2o-llm-studio/

https://venturebeat.com/ai/h2o-ai-launches-h2ogpt-llm-studio/?embedable=true

2. LM Studio

Popular desktop application with a very user-friendly GUI.
Easily download models from Hugging Face and chat with them.
Operates as a local server in the background for API access.
Simple setup and configuration for local LLM experimentation.
Many popular tutorials for local LLMs use LM Studio.
This tool has an active and vibrant community around it.
It also provides compatibility checks and full integration with HuggingFace - see the bonus section.
With quantization, large models can run on small (tiny) hardware like 8 GB VRAM.
You can read more about it in the article below:
The link to go to is https://lmstudio.ai/

https://www.linkedin.com/pulse/discovering-lm-studio-gateway-private-locally-hosted-ai-thyagarajan-j0off/?embedable=true

3. Ollama:

Designed for incredibly simple command-line and GUI (evolving) LLM serving.
Focuses on ease of use and quick deployment of models.
Acts as a local server for API interactions.
It runs the LLM completely offline.
Growing GUI support to enhance user-friendliness.
However, security is an issue and requires careful setup.
Ollama is known today as the best tool for developers to set up LLMs locally.
While it is developer-friendly, even non-developers can use it with existing tutorials.
You can learn more about it with the article below:
And you can get it from the following website: https://ollama.ai/

https://medium.com/@mauryaanoop3/ollama-a-deep-dive-into-running-large-language-models-locally-part-1-0a4b70b30982?embedable=true

4. GPT4All

Provides a free and open-source desktop application with a user-friendly UI.
Offers server components for programmatic API access as well.
Runs powerful LLMs on consumer-grade hardware locally.
Easy to download and interact with models through the GUI.
This tool is friendly for non-developers.
This is a powerful tool with thousands of models available.
It provides a complete all-in-one setup for running LLMs locally.
For more information, read the article below.
And you can find this tool at https://gpt4all.io/

https://hackernoon.com/gpt4all-an-ecosystem-of-open-source-compressed-language-models?embedable=true

5. LocalAI:

Self-hosted, community-driven alternative to OpenAI, using Docker.
Offers an OpenAI-compatible API for easy integration.
While backend-focused, community GUIs enhance user interaction.
This is a very versatile server supporting a wide range of models and hardware.
Since it uses Docker, it can run on all platforms and all devices.
This is a powerful tool with several community GUIs available.
There are detailed instructions on the website.
This tool requires a little technical expertise.
Read more about it in the article below.
You can find this tool at https://localai.io/

https://thenewstack.io/how-to-run-a-local-llm-via-localai-an-open-source-project/?embedable=true

6. Jan:

Cross-platform AI client application with an intuitive GUI.
This tool has gained immense popularity of late.
Operates as a local server with API access for developers.
It markets itself as the future of AI for computing.
User-friendly interface for interacting with various AI models.
Clean and easy-to-navigate application for local AI tasks.
Read more about Jan on the article below:
You can access this tool at https://jan.ai/

https://medium.com/mr-plan-publication/discover-jan-ai-the-open-source-assistant-transforming-local-ais-for-everyone-19d2e5544b38?embedable=true

7. text-generation-webui (oobabooga):

Feature-rich web UI for local LLM interaction.
Offers a comprehensive API in addition to the web interface.
Supports various backends and extensive customization options.
Popular for its balance of features and user-friendliness.
This is a powerful tool and forks of this tool are very popular.
It can run several LLM models locally and is user friendly.
Read more about oobabooga in the article below.
You can find the tool at https://github.com/oobabooga/text-generation-webui

https://pyimagesearch.com/2024/07/01/exploring-oobabooga-text-generation-web-ui-installation-features-and-fine-tuning-llama-model-with-lora/?embedable=true

8. PrivateGPT

Privacy-focused tool for querying local documents using LLMs.
Often includes a user-friendly web UI for document interaction.
Processes data locally without sending it to the cloud.
Ideal for secure and private document analysis.
This tool lives up to its name - no data is sent to the big tech companies.
Document analysis with LLMs can now be performed locally.
You can read more about it in the article below:
And this tool is available at https://github.com/imartinez/privateGPT

https://litslink.com/blog/what-is-private-gpt?embedable=true

9. vLLM:

High-throughput inference server designed for performance.
Deployable locally on a single machine for development or local use.
Focuses on speed and efficiency in serving LLMs.
Suitable for applications requiring fast local inference.
vLLM was a breakthrough for LLM performance with the innovation of PagedAttention.
The article below explains how to run LLMs locally with vLLMs
Be warned: this particular tool requires technical expertise.
You can find this tool at https://vllm.ai/

https://ai.gopubby.com/running-llms-locally-on-the-mac-using-vllm-b128e06d5dbd?embedable=true

10. MLC LLM:

Server for serving optimized, compiled LLMs for specific hardware.
This tool can run LLMs everywhere - CPUs, tablets, and even mobiles.
Provides example web UI demos for interacting with compiled models.
Focuses on efficient execution through model compilation.
Offers performance benefits through hardware-specific optimizations.
This is a promising trend for the future and fully deserves its place on this list.
It has universal deployment with native code, and is a highly promising project.
Learn more about this project with the article below:
You can find this tool at https://mlc.ai/mlc-llm/

https://www.restack.io/p/mlc-llm-answer-local-llm-android-cat-ai?embedable=true

11. llama.cpp

This original file by Georgi Gerganov was the innovation that led to an explosion of local LLMs.
It is a highly optimized C++ server for efficient Llama model inference.
Widely used as a backend for other tools and standalone servers.
Offers robust server API and excellent performance.
A foundational tool for local LLM serving.
It changed LLM serving forever.
Read more details in the article below:
The highly remarkable repository is https://github.com/ggerganov/llama.cpp

https://medium.com/@jankammerath/the-resurgence-of-c-through-llama-cpp-cuda-metal-8d2322cd8ded?embedable=true

12. ExLlamaV2:

Server optimized for extremely fast inference of quantized models.
Can be deployed as a dedicated server for high-speed applications.
Known for its speed when running quantized LLMs.
Ideal for resource-constrained environments needing fast inference.
This is ideal for edge computing and mobile computing.
Embedded systems are also a good use case.
Read more about this tool in the article below:
You can find this tool at https://github.com/turboderp/exllamav2

https://medium.com/data-science/exllamav2-the-fastest-library-to-run-llms-32aeda294d26?embedable=true

13. llamafile

Packages everything to run and serve an LLM into a single executable.
Extremely simple deployment for local server scenarios.
This is a highly innovative tool which can be used in multiple environments
Self-contained server for easy distribution and execution.
User-friendly for quick setup and running of LLMs.
Read more about Llamafile in the article below:
And you can get this tool from https://github.com/Mozilla-Ocho/llamafile

https://simonw.substack.com/p/llamafile-is-the-new-best-way-to?utm_source=profile&utm_medium=reader2&embedable=true

14. WebLLM

Enables running large language models directly in the web browser using WebGPU.
This is a real innovation that runs locally and without GPUs.
Focuses on client-side, in-browser inference for privacy and offline capabilities.
Provides JavaScript APIs to load and execute LLMs within web applications.
Demonstrates impressive performance for running models directly in the browser environment.
However, you will need to download large models initially.
Read more about WebLLM in the article below:
You can get this tool at https://webllm.mlc.ai/

https://techhub.iodigital.com/articles/what-is-webllm?embedable=true

15. Hugging Face Transformers:

Core Python library for building and using NLP models, including LLMs.
While not a server itself, it's the foundation for creating custom LLM servers.
Extremely flexible and widely used by developers in the LLM space.
Enables building highly tailored local LLM serving solutions.
This site has democratized AI development for millions.
It is the backbone behind nearly every tool listed here.
To not include it would make this list incomplete.
And it contains nearly 1.5M LLMs, downloadable for free.
Read the article below for a deep dive.
The website is https://huggingface.co/docs/transformers/index

https://www.datacamp.com/tutorial/what-is-hugging-face?embedable=true

16. Hugging Face App Market (Spaces):

A platform for discovering and exploring AI apps, including LLM demos.
There are more than 400,000 apps on the Hugging Face App Market today.
Spaces showcase user interfaces for interacting with models.
Provides a marketplace to find inspiration and examples of LLM applications.
Can be used to test and prototype UI ideas for local LLM projects
Provides payment for developers to sell their AI models
All of this allows entry to the AI market with minimal technical knowledge, democratizing AI.
Read more about how to create your own space in the article below:
You can access the app market at the following link: https://huggingface.co/spaces

https://coda.io/@peter-sigurdson/hugging-face-spaces?embedable=true

Conclusion

There is no sector that LLMs will not disrupt.

With the correct guidance, Generative AI will reshape the world as we know it.

Often, one might find oneself in a situation where you do not want your data to leave your system.

This is especially true for enterprises and governments.

At such times, these tools will be invaluable.

Cutting-edge research is another sector where you do not want your data to leave your enterprise.

You can deploy the tool of your choice to a centralized server hosted by the company.

The server must be air-gapped to the public and use one of the tools on this list.

I sincerely hope you change the world with your Generative AI research.

All the best for your journey!

References

Unite:

https://www.unite.ai/best-llm-tools-to-run-models-locally/: Unite.AI article reviewing top tools for running LLMs locally, updated for 2025.
DataCamp:

https://www.datacamp.com/tutorial/run-llms-locally-tutorial: DataCamp tutorial on methods and tools for running LLMs locally, with practical guidance.
Getstream:

https://getstream.io/blog/best-local-llm-tools: GetStream blog listing the best tools for local LLM execution, with detailed insights.
H2O LLM Studio:

https://h2o.ai/products/h2o-llm-studio/ - Official product page for H2O LLM Studio, a no-code GUI for LLM fine-tuning and deployment.

https://github.com/h2oai/h2ogpt - GitHub repository for H2OGPT, H2O.ai's open-source large language model.
LM Studio:

https://lmstudio.ai/ - Official website for LM Studio, a user-friendly desktop application for running LLMs locally.
Ollama:

https://ollama.ai/ - Official website for Ollama, designed for simple command-line and GUI-based local LLM serving.
GPT4All:

https://gpt4all.io/ - Official website for GPT4All, providing a free and open-source ecosystem for local LLMs.
LocalAI:

https://localai.io/ - Official website for LocalAI, a self-hosted, community-driven local AI server compatible with OpenAI API.
text-generation-webui (oobabooga) API:

https://github.com/oobabooga/text-generation-webui - GitHub repository for text-generation-webui (oobabooga), a feature-rich web UI for local LLMs.
Jan:

https://jan.ai/ - Official website for Jan, a cross-platform AI client application with local LLM support.
PrivateGPT:

https://github.com/imartinez/privateGPT - GitHub repository for PrivateGPT, a privacy-focused tool for local document Q&A using LLMs.
FastChat:

https://github.com/lm-sys/FastChat - GitHub repository for FastChat, a research platform for training, serving, and evaluating LLMs.
vLLM:

https://vllm.ai/ - Official website for vLLM, a high-throughput and efficient LLM inference server.
MLC LLM:

https://mlc.ai/mlc-llm/ - Official website for MLC LLM, focusing on machine learning compilation for efficient LLM execution.

https://github.com/mlc-ai/mlc-llm - GitHub repository for MLC LLM, containing code and examples for local execution.
llama.cpp:

https://github.com/ggerganov/llama.cpp - GitHub repository for llama.cpp, a project focused on efficient C++ inference of Llama models.
ExLlamaV2:

https://github.com/turboderp/exllamav2 - GitHub repository for ExLlamaV2, known for fast inference of quantized LLMs.
WebLLM:

https://webllm.mlc.ai/ - Official website for WebLLM, enabling in-browser LLM execution using WebGPU.
llamafile:

https://github.com/Mozilla-Ocho/llamafile - GitHub repository for llamafile, packaging LLMs into single executable files for easy deployment.
Hugging Face Transformers:

https://huggingface.co/docs/transformers/index - Documentation for Hugging Face Transformers library, a core Python library for NLP models.
Hugging Face App Market (Spaces):

https://huggingface.co/spaces - Hugging Face Spaces, a platform for hosting and discovering AI application demos.

Google AI Studio was used in this article. It is available at this link: https://ai.google.dev/aistudio

All images created by the Flux AI Art Generation Models at Night Cafe Studio: https://creator.nightcafe.studio/explore

While I do not monetize my writing directly, your support helps me continue putting articles like this one out without a paywall or a paid subscription.

If you want ghostwritten articles like this one appearing under your name online, you can get it!

Contact me at:

https://linkedin.com/in/thomascherickal

For your own ghostwritten article! (Prices are negotiable and I offer country-wise parity pricing.)

If you want to support my writing, consider a contribution at Patreon on this link:

https://patreon.com/c/thomascherickal/membership

Alternatively, you could buy me a coffee on this link:

https://ko-fi.com/thomascherickal

Cheers!

You Can Run These 16 LLMs Locally, No Questions Asked

16 Tools to Run Your Local LLMs With Privacy Support

16 Tools to Run LLMs Locally

1. H2O LLM Studio

2. LM Studio

3. Ollama:

4. GPT4All

5. LocalAI:

6. Jan:

7. text-generation-webui (oobabooga):

8. PrivateGPT

9. vLLM:

10. MLC LLM:

11. llama.cpp

12. ExLlamaV2:

13. llamafile

14. WebLLM

15. Hugging Face Transformers:

16. Hugging Face App Market (Spaces):

Conclusion

References