Distributed systems contain a lot of moving parts, and it is critical to monitor telemetry data such as metrics, logs and traces to gain visibility and allow teams to determine the root cause of an issue. The goal of many observability initiatives is to increase availability and performance. Grafana Labs makes one of the most widely used open source observability stacks (Grafana for visualization, Loki for logs, Mimir for metrics, Tempo for traces, Alertmanager for alerts), and sells Grafana Cloud and Grafana Enterprise.

Grafana Mimir is an AGPLv3 licensed open source software project that, when coupled with MinIO, provides scalable, long-term storage for Prometheus metrics. Mimir was built using a microservices-based architecture that is horizontally scalable. Each microservice is referred to as a component, and Mimir runs as a single binary made up of these components. Most components are stateless and do not require any data to be persisted between restarts.

When you combine Mimir and MinIO you produce an infrastructure that is particularly well suited to meet the needs of enterprise cloud-native observability with:

Some of the core strengths of Grafana Mimir include:

Mimir was developed to be the most scalable, most performant open source time series database available. Mimir easily scales to 1 billion metrics and beyond, with blazing fast query performance that is up to 40x faster than Cortex, the TSDB Mimir was built to replace. Cortex has been a CNCF project since 2018 and is widely used to store Prometheus metrics. When creating Mimir, Grafana Labs laid the groundwork for enterprise-ready observability with AGPLv3 licensing, access controls, and improved performance, scalability and availability.

Grafana Labs has a goal for Mimir: To be the best scalable time series database regardless of metrics format. Enterprises should be able to consume Prometheus metrics (and other metrics as other vendors collaborate) without modifying existing code.

Now that we’ve learned what Mimir is, let’s run through an introductory tutorial.

Grafana Mimir and MinIO Tutorial

This tutorial draws on an existing tutorial, Play with Grafana Mimir to show how easy it is to get started with Mimir using Docker.

Create a copy of the Grafana Mimir repository using the Git command line:

git clone https://github.com/grafana/mimir.git

Navigate to the tutorial directory:

cd mimir/docs/sources/tutorials/play-with-grafana-mimir/

Start MinIO, Mimir, Prometheus, Grafana and NGINX

docker compose up

This will bring up the following:

The following ports are used:

If you want to dig deeper into any configurations used in this tutorial, please see the YAML files saved to

~/mimir/docs/sources/tutorials/play-with-grafana-mimir/config/

To access Grafana, launch a browser and open http://localhost:9000. You’ll use Grafana to view dashboards that display the status of the Mimir cluster. The dashboards query Mimir for the metrics they display. From the menu on the top left, click Dashboards, then Browse to see the dashboards that have been preloaded for the tutorial. These dashboards are from the Grafana Mimir mixin, which packages together Grafana Labs’ best practice dashboards, recording rules and alerts for monitoring Mimir.

It typically takes 3-5 minutes after we launch our tutorial containers for metrics to be displayed in Grafana dashboards. We’re also running Mimir without an ingress gateway, query-scheduler or memcached, so the related dashboards will be empty.

At this early stage of learning Mimir, start by browsing the dashboards for writes, reads, queries and object store. For example, the object store dashboard shows operations that have taken place since we brought Mimir up.

Configure a Recording Rule

Recording rules are a mechanism that precomputes frequently needed or computationally costly expressions and saves the result as a new set of time series. Follow these instructions to configure a recording rule in Mimir using Grafana.

This sum:up recording rule will display the number of Mimir instances that are up and reachable to be scraped. Once the rule is created, it will be available for querying and inclusion in dashboards.

Open the Alerting menu from the left toolbar and click “New alert rule”:

Enter the following to configure the recording rule:

  1. Choose Mimir or Loki recording rule
  2. Configure the following:
  3. Rule name = sum:up
  4. Choose Mimir in the Select data source field
  5. Namespace = example-namespace
  6. Group = example-group
  7. Query Expression = sum(up)
  8. Choose Save and Exit in the upper right corner.

To verify that your new recording rule runs correctly, open Explore from the left hand menu:

In the Metric dropdown, choose sum:up, then click Run query from the top right, then click on the Inspector button. Below, click Data to see a list of times and query results. The result should be “3”, indicating that the three local instances of Mimir are operational.

Configure an Alert Rule

Alerting rules built on Mimir follow the same PromQL format as those built on Prometheus and Loki. Grafana evaluates the expression and, if necessary, fires an alert using Alertmanager. We dug into this pretty deeply in an earlier blog post, Multi-Cloud Monitoring and Alerting with Prometheus and Grafana.

We’re going to create an alert that fires when the number of Mimir instances drops below three.

In the left hand menu, hover over Alerting and then click New alert rule.

  1. Choose Mimir or Loki alert
  2. Configure the following:
  3. Rule name = MimirNotRunning
  4. Choose Mimir in the Select data source field
  5. Namespace = example-namespace
  6. Group = example-group
  7. Query Expression = up == 0
  8. Choose Save and Exit in the upper right corner.

Navigate to the Alerting page and you will see our Mimir recording rule and alert rule. Note that there’s a nice, big, comforting green Normal status shown next to the alert because all of our Mimir containers are still running.

We’ll simulate an error condition by terminating one of the three Mimir instances (make sure that you are in the ~/mimir/docs/sources/tutorials/play-with-grafana-mimir directory :

docker compose kill mimir-3

As we abruptly terminated a Mimir instance, there will be a brief period where Grafana shows an error while querying rules. This will automatically resolve as soon as Mimir’s internal health checks detect the terminated instance as unhealthy.

In about one minute, the alert will shortly indicate a yellow Pending state.

After another minute, the alert will turn to the red Firing state:

If we had configured Alertmanager with notification channels, alerts would be firing off to the appropriate mechanism and contact. Please see Multi-Cloud Monitoring and Alerting with Prometheus and Grafana for instructions.

Before we bring our terminated Mimir instance back up, return to the Explore page in Grafana and query our sum:up recording rule. We can see that Mimir continued to record metrics even though a Mimir instance was down.

Finally, bring the Mimir instance back up:

docker compose start mimir-3

Return to the Alerting page and notice that our alert status is back to Normal.

Conclusion

In this tutorial, you learned how to run Grafana Mimir and MinIO in a high-availability configuration. We consumed Prometheus metrics from Mimir itself, then queried and visualized them in Grafana. We also configured a recording rule and an alert and verified that the alert fired as expected when the condition was met.

You can also configure Mimir and Grafana to scrape Prometheus metrics from MinIO and fire alerts via AlertManager. Mimir stores data in object storage for persistence, allowing it to take advantage of ubiquitous, cost-effective and high-durability MinIO.

Give Grafana Mimir a go! If you have questions, please join our Slack channel or send us an email at [email protected].

Also published here.