LLMjacking Emerges as a Costly New Threat to Self-Hosted AI Infrastructure

What is LLMjacking?

The quick rise of Large Language Models has given hackers a new and profitable way to attack. LLMjacking, which is the illegal hijacking of self-hosted LLM infrastructure for bad purposes, is an immediate threat. A lot of people in the security community have talked about this. Recent news stories have come out about "Operation Bizarre Bazaar," a complicated plan that searches the internet for exposed LLM and Model Context Protocol endpoints, takes control of them, and sells the stolen resources on dark web marketplaces. There have been more than 35,000 attack sessions, with an average of 972 attacks per day. This threat is not just a theory; it is a clear and present danger to any organization that uses its own LLM infrastructure.

This article goes into great detail about how LLMjacking works, what weaknesses it takes advantage of, and most importantly, the code-level solutions you can use right now to protect your self-hosted LLMs from this growing threat.

The Anatomy of an LLMjacking Attack

LLMjacking is not merely a singular exploit, but rather a complex, multi-stage operation characterized by a systematic approach. Comprehending the mechanisms behind these attacks constitutes the initial phase in formulating effective defenses against them.

Stage 1: Scan

Bots that work automatically are always looking for open ports and services that are linked to popular self-hosted LLM frameworks on the internet. The main goals are:

The default port for Ollama instances is 11434. There are no safety measures in place.
You can access OpenAI-compatible APIs through the web. These include those made with LiteLLM or FastChat, which work on standard development ports like 8000.
Unauthenticated Model Context Protocol servers make it easier for AI agents to work with different tools and data sources by providing a standard way for them to do so.

Stage 2: Validation

Once a potential target is identified, the infrastructure of an entity, lets say, "silver.inc" evaluates the endpoint within 8 hours or even less. The validation procedure then involves dispatching a closely orchestrated sequence of API requests to:

Build a comprehensive inventory of the existing models along with their respective functionalities.
Evaluate the utility and promptness of the responses.
Verify whether the endpoint implements rate limiting or possesses additional security measures.
Explore potential avenues for lateral navigation within internal systems.

Stage 3: Monetization

Validated endpoints are added to "The Unified LLM API Gateway," a marketplace that offers access to over 30 different LLMs—all running on hijacked infrastructure. This platform is hosted on bulletproof infrastructure in the Netherlands and marketed through Discord and Telegram channels. Customers pay via cryptocurrency or a payment facilitating applications to access these stolen resources.

The financial and security implications of an LLMjacking attack are severe. Running inference on big models can quickly add up to huge cloud bills because of the high cost of computing. Also, if the hijacked LLM can get to sensitive internal data through retrieval-augmented generation or function calling, the chances of data being stolen are very high. Finally, a hacked LLM endpoint can be used as a base for moving around your network.

The Problem's Source: Deployments that are insecure by default

The main reason LLMjacking works so well is that many self-hosted LLM frameworks are made for local, single-user development and aren't secure by default for production or web-exposed environments. The same tools that have made powerful AI available to everyone have also made it easier for people who aren't careful to get hacked.

Take a look at Ollama, which is one of the best tools for running LLMs on your own computer. Ollama is set up so that you don't have to log in when you go to http://localhost:11434. This makes sense for local development, but as soon as that port is open to the internet, either on purpose or by mistake, it becomes a major security hole.

The good news is that these problems can be fixed if you do things the right way. In the next part, we'll talk about the specific things you can do to protect your self-hosted LLM infrastructure.

A Useful Guide to Making Your Self-Hosted LLM More Secure

You need to use a defense-in-depth approach to protect your self-hosted LLM infrastructure. You can't just put a firewall in front of your server; you need to add security controls to every layer of the stack. These are the most important things you need to do.

1. Don't ever show raw endpoints: the power of the reverse proxy

The most important thing you can do to keep your self-hosted LLM safe is to never let the raw, unauthenticated endpoint be seen by the internet. You should always put a reverse proxy, like Nginx or Caddy, in front of your LLM service instead. This adds an important layer of security, such as authentication, TLS encryption, and rate limiting.

How to Protect Ollama with Nginx and API Key Authentication

Ollama is a great way to run LLMs on your own computer, but you shouldn't put it on the web. This is how to use Nginx to add a layer of API key authentication to your Ollama instance.

To start, you need to make a file to keep your valid API keys in. In this case, we'll call it apikeys.txt:

# apikeys.txt
my-secret-api-key-1
another-valid-key

Next, you will need to configure Nginx to act as a reverse proxy and check for a valid API key in the Authorization header of incoming requests. Here is a sample Nginx configuration file:


# /etc/nginx/sites-available/ollama.conf A map to see if the API key you gave is valid.

    default 0; "Bearer my-secret-api-key-1" 1; "Bearer another-valid-key" 1;

server { listen 80; server_name your-llm-domain.com; 

    location / {
        # If the API key is not valid, return 401 'Unauthorized' and reject requests.

        # Proxy requests to the Ollama backend proxy_pass http://localhost:11434; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme;
        
        # Raise the timeout for inference requests that take a long time to process. proxy_read_timeout 300s; proxy_connect_timeout 75s; 
    }
}

This does two important things:

It constructs a variable called $api_key_valid with the map directive. This variable will be set to 1 based on the Authorization header having a valid key; otherwise is set to 0.
It looks at the value of $api_key_valid in the location block. If the request is 0, a 401 Unauthorized error means that it is not allowed. If not, the request goes through a proxy to the Ollama backend.

This setup means that every request to your Ollama instance must include a valid API key in the Authorization header:

curl http://your-llm-domain.com/api/generate \-H "Authorization: Bearer my-secret-api-key-1" \
-d '{ "model": "llama2", "prompt": "Why is the sky blue?" }'

2. Making sure MCP is safe

The Model Context Protocol is a great way to make AI agents, but it also makes them less safe. Attackers can get into your LLMs and any tools or APIs that the MCP server is set up to use if it is not protected. To protect an MCP server, you need to use more than one layer of security:

Isolation at the Network Level

Your MCP server should not be directly connected to the public internet whenever possible. If only a small number of clients need to be able to access it, use a VPN or other network-level controls to limit access. Use security groups or network policies to make sure that only approved services can talk to your MCP server when you deploy it in the cloud.

Authentication and Authorization

MCP doesn't say which authentication method to use, so you have to choose one yourself. You should at least require a unique API key that can't be guessed for every request. If you need to keep things more private, think about using a stronger authentication method like OAuth 2.0 or mutual TLS.

You should also use a fine-grained permission model to control which clients can use which tools. For instance, you could have a "read-only" client that can only use tools that give you information and a "read-write" client that can use tools that change data.

Cleaning Up Input and Output

You should never trust input from the client, just like you shouldn't trust input from any other web app. To stop injection attacks, you should carefully clean and check all input to your MCP server. Also, you should clean up the output from your tools before sending it back to the client to stop information from leaking.

3. Making Rate Limiting Work Well

Rate limiting is a good way to protect against denial-of-service attacks and the resource abuse that happened in Operation Bizarre Bazaar. Limiting the quantity of the requests a client can make within a specified timeframe serves as a safeguard against the potential for a single malicious actor to overload your system or incur significant costs.

Various strategies exist for the implementation of rate limiting:

Per-IP Rate Limiting: Regulate the volume of requests originating from an individual IP address. Although straightforward to implement, it may prove ineffective against distributed attacks and could unjustly penalize users operating behind a NAT.
Implementing rate limiting based on the API key proves to be a more efficient strategy. This way guarantees equitable distribution of resources among users and facilitates in the identification and blocking of abusive individuals.
Token-Based Rate Limiting: In the context of LLMs, it is generally more logical to implement rate limiting based on the quantity of tokens produced, as opposed to the frequency of requests. The computational cost associated with a request directly relates with the quantity of tokens produced.

In a production system, you would want to use a more robust and scalable solution, such as Redis with sliding window counters, or a dedicated rate-limiting library like flask-limiter.

Additional Security Best Practices

In addition to the primary defenses outlined previously, consider implementing the following security measures:

Monitor and document all access activities.

Maintain comprehensive records of all API requests, documenting the API key utilized, the originating IP address, the requested model, and the total number of tokens generated. This will assist in identifying unusual behavior and investigating potential security issues. To facilitate analysis, implement a centralized logging system such as ELK Stack or Splunk.

Routine Security Assessments

Periodically scan your external attack surface to identify any open services. Utilize tools such as nmap or cloud-native security scanners to identify misconfigured endpoints prior to potential exploitation by attackers.

Establish usage constraints

In addition to implementing rate limiting, it is advisable to establish strict usage limits for each API key. For example, it is possible to allocate a limit of 1 million tokens per key on a monthly basis. This provides an additional layer of security to prevent the misuse of resources.

Keep your software updated

Make sure to keep your LLM frameworks, reverse proxies, and operating systems upgraded at all times to patch known vulnerabilities.

Conclusion: It's time to put security first.

The rise of LLMjacking should wake up the AI community. We can no longer afford to ignore security. We need to keep pushing the limits of what AI can do, but we also need to put security first and make systems that can withstand attacks.

The Operation Bizarre Bazaar campaign shows that attackers are actively looking for and using weak LLM infrastructure on a large scale. The good news is that the defenses in this article—putting your LLM behind a reverse proxy, using strong authentication and authorization, and enforcing strict rate limiting—are all well-known security measures that don't take a lot of work to put into place.

We have the tools we need to make AI systems safe. We, the engineers and developers who work on the front lines, are the ones who need to use them. You can keep your AI infrastructure safe and secure by taking these steps today. They will also protect your business from the growing threat of LLMjacking.