Hi! If you are a mobile developer and follow AI trends, you probably wondered how to integrate language models (LLM) into your apps directly from Android Studio. In this article, I will tell you how you can do it quickly and easily, without relying on external APIs and cloud solutions .

I will share a step-by-step guide on how to run a local LLM on your computer and integrate it into Android Studio. We will figure out how to choose a model, prepare the environment, and how to use it.

🛠️ Step-by-step guide

Selecting and loading a model

Setting up the environment

Integration with Android Studio

🔍 What else did I try and conclusions



The search will show a large number of models to choose from.

Also pay attention to the GGUF and MLX checkboxes.

For Apple Silicon (M1/M2/M3/M4) it is preferable to use MLX , as they should work better, but there is not always an MLX model for the desired model (this is not a problem if the model is small).


Also, when loading a model, you can choose its “cut-down”. Try with the smallest one, if everything is ok and the hardware allows it, try a fatter model, etc.

Quantized models are compressed versions of neural networks that take up less space and require fewer resources to run (e.g. on low-end hardware).

Usually they are slightly worse than the original in quality: the stronger the compression (the lower the number of bits, for example, 2, 4, 8), the worse the model works. At the same time, 8-bit models are practically indistinguishable from the original, and 4-bit ones give a good balance between quality and size.

It is important to remember that the number of parameters affects quality even more than the degree of quantization. That is, a model with a large number of parameters but a lower bit rate (e.g. quantized 13B) will often perform better than a model with a smaller number of parameters, even if it is not compressed at all (e.g. original 7B).


In this example I will use the Llama 3.1 8B Instruct 4bit model.

(If you have weak hardware, you can try the deepseek-coder-6.7b-instruct model)

Select and download.


While we wait for the download…

Next, go to the Developer tab and select Select a model to Load. Select our module and load it into memory.


Setting up the environment

If everything is OK, then the model should be READY and we will see logs in the console. Now your model is available at http://127.0.0.1:11434.


Integration with Android Studio




In the config we can configure our model for different roles . Now the request from Android Studio is redirected to the local server. Let’s check — everything is ok.





Also in the same menu in Tools you can configure the behavior for each operation.


The plugin from Continue also has bugs. For example, they have a hardcode for Ollama on the port in the plugin and it listens to exactly 11434 from the studio. I saw similar issues, so in order not to wait for fixes, it was important to choose it.

🔍 What else have you tried?

Paid ChatGPT :

You can use it via a native app with apply code features, but it doesn’t work well with AS, sometimes it doesn’t apply the code or crashes when switching tabs. Potentially ChatGPT (OpenAPI), you can use it with the current setup in the Continue plugin, just switch to the online model.

Paid Cursor:

There are several models under the hood, but the best one is the same claude-3.7. Switching to gpt-mini/o4-mini and similar ones doesn’t make much sense. It’s also quite awkward to work in their IDE. Yes, they support extensions for working with Gradle, Kotlin. But they can’t normally use and launch Cursor from their environment. I encountered the fact that I had 2 Gradle and Java running — one from Cursor, the other from AS. And they ate up almost all the memory. Well, there is nothing more convenient than AS for mobile development :). Alternatively, you can give it limited access to a project module or a set of files and ask it to perform a certain task — the agent will do it itself. I often had it that it did something wrong, then something else, then broke it, etc. It is especially difficult if there are more than 10 files.

What’s next?



Conclusions

Running LLM for Android Studio locally is not only convenient, but also significantly expands your capabilities as a developer. Of course, at the moment the solutions are still a bit “raw”, but the potential is huge, and now you can feel the real advantages of AI right on your hardware.

🔥 Let’s discuss!

Tell us how you managed to integrate the local LLM? Which models seemed the most convenient? (Ideally, if you write the hardware and what suited you and the size of the project).

Share your thoughts and ask questions in the comments :-)

Was it interesting and do you have any questions? (my linkedin)