sia.hackernoon.com

Starburst is a powerful distributed SQL engine that allows teams to query data across multiple sources. However, when it comes to monitoring Starburst clusters, there are some limitations:

The Starburst web UI gives basic cluster health
The Starburst API exposes some system metrics
But full metrics (CPU, JVM, memory, total queries, slow queries, etc.) require additional integration

We manage 50+ Starburst clusters running across multiple environments (OpenShift, Kubernetes, Linux).

Manually going into the Starburst UI or CLI to run queries and check metrics was becoming too complex.

That’s why we Come up a solution using:

Grafana dashboards
Grafana Trino Plugin
Prometheus + JMX Exporter → full Starburst metrics export

Why Native Starburst Metrics Are Limited

Out of the box, Starburst exposes basic system metrics:

Active queries
Running nodes
Query latency
Query errors

But for deeper monitoring — you need:

JVM metrics (CPU usage, memory, GC, threads)
Detailed query metrics (count, latency, slow queries)
Resource usage per cluster
Cluster-wide trends over time
Alerting on custom thresholds

Starburst does not expose all of these metrics directly in the UI or API. This is why we need an additional metrics pipeline.

Solution Architecture

To fully monitor Starburst, we built this architecture:

+---------------------+    JMX Exporter    +---------------+
| Starburst Cluster 1 |  --> Port 8081 -->  | Prometheus    |
+---------------------+                   +---------------+

+---------------------+    JMX Exporter    +---------------+
| Starburst Cluster N |  --> Port 8081 -->  | Prometheus    |
+---------------------+                   +---------------+

                Prometheus --> Grafana --> Dashboards / Alerts

                + Grafana Trino Plugin --> SQL-based dashboards

Components:

JMX Exporter → exposes Starburst JVM + query metrics to Prometheus
Prometheus → scrapes all Starburst clusters at regular intervals
Grafana → visualizes metrics
Grafana Trino Plugin → allows you to run SQL queries on Starburst directly and show results in Grafana panels

Why Use Grafana Trino Plugin?

The Grafana Trino Plugin is extremely useful for Starburst monitoring because:

It connects Grafana directly to Starburst as a SQL data source
You can run Starburst SQL queries in Grafana panels
You can refresh results dynamically
You can build dashboards with panels per cluster
It scales to multiple clusters easily

Without this plugin, you would need to:

Manually log into each Starburst UI
Run the same query multiple times
Manually copy/paste results
Very slow and not scalable

With Grafana Trino Plugin:

Configure each cluster as a Trino data source in Grafana
Write SQL queries once
Save dashboards
Refresh all clusters with one click
Add alerts and trends → no manual work

Grafana Trino Plugin - Overview

The Trino datasource plugin allows you to query and visualize Trino (and Starburst) data inside Grafana.

Installation

Run Grafana with the plugin using Docker:

docker run -d -p 3000:3000 \
  -v "$(pwd):/var/lib/grafana/plugins/trino" \
  -e "GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=trino-datasource" \
  --name=grafana \
  grafana/grafana-oss

Features

Authentication: HTTP Basic, TLS client, OAuth, JWT
Raw SQL editor — run any query
Works with Starburst OSS, Galaxy, and Enterprise
Macros supported → useful for time series panels

Macros Supported

$timeFrom($column) — lower boundary of time range
$timeTo($column) — upper boundary of time range
$timeGroup($column, $interval) — group by time
$dateFilter($column) — date range filter
$timeFilter($column) — timestamp range filter
$unixEpochFilter($column) — unix timestamp range

Example Query Using Macros

SELECT
  atimestamp AS time,
  metric_value AS value
FROM starburst_metrics_table
WHERE $__timeFilter(atimestamp) AND cluster_name IN($cluster)
ORDER BY atimestamp ASC

How To Set It Up

1. Enable JMX Exporter in Starburst

Configure JMX agent in your Starburst deployment:

start.args=-javaagent:/opt/starburst/jmx_prometheus_javaagent.jar=8081:/opt/starburst/jmx_exporter_config.yaml

2. Configure Prometheus

Add all your Starburst clusters:

- job_name: 'starburst'
  static_configs:
    - targets: ['starburst-cluster-1:8081', 'starburst-cluster-2:8081']

3. Install Grafana Trino Plugin

Add Trino Plugin in Grafana → Data Sources
Provide Starburst cluster URL + credentials
Test connection

4. Build Dashboards

Create a panel → select Trino Plugin data source
Write your Starburst SQL query:

SELECT state, count(*) FROM system.runtime.queries GROUP BY state

Save panel → use macros for time filters
Build dashboards with multiple panels per cluster

Benefits

Monitor Multiple Starburst clusters from one Grafana dashboard
Run Starburst SQL queries as panels → live refresh
Combine Prometheus metrics + SQL panels
Full CPU, JVM, memory, query stats
Add Slack / PagerDuty alerts for Starburst anomalies

Limitations and Tips

Starburst native metrics are limited → configure JMX Exporter carefully
Grafana Trino Plugin is best for SQL panels → use Prometheus panels for JVM metrics
Test scrape intervals carefully → avoid overloading clusters
Use macros to make dashboards dynamic and reusable across clusters

Conclusion

If you want full Starburst monitoring, this is the best architecture:

Enable JMX Exporter → Prometheus → Grafana
Use Grafana Trino Plugin → SQL dashboards → live queries
Monitor Multiple clusters from one place
Add alerts, trends, historical analysis
Share dashboards with teams

This is now my preferred way of managing Starburst monitoring in large environments.

Final Architecture Summary

Starburst --> JMX Exporter --> Prometheus --> Grafana Panels + Trino Plugin SQL Queries --> Slack / Alerts / Visual Dashboards

Master Starburst Monitoring with Grafana and Trino Plugin