Prometheus promql: Difference between revisions

From wikinotes
 
(23 intermediate revisions by the same user not shown)
Line 4: Line 4:


* HTTP API
* HTTP API
* in the UI's graph?
* UI table/graph view


= Documentation =
= Documentation =
Line 13: Line 13:
|-
|-
| official examples || https://prometheus.io/docs/prometheus/latest/querying/examples/
| official examples || https://prometheus.io/docs/prometheus/latest/querying/examples/
|-
| builtin functions || https://prometheus.io/docs/prometheus/latest/querying/functions/
|-
|-
| re2 (regex engine) || https://github.com/google/re2/wiki/Syntax
| re2 (regex engine) || https://github.com/google/re2/wiki/Syntax
|}
|}
</blockquote><!-- Documentation -->
</blockquote><!-- Documentation -->


= Comments =
= Comments =
<blockquote>
<blockquote>
<syntaxhighlight lang="go">
<syntaxhighlight lang="promql">
# a comment
# a comment
</syntaxhighlight>
</syntaxhighlight>
Line 30: Line 31:
== Strings ==
== Strings ==
<blockquote>
<blockquote>
<syntaxhighlight lang="yaml">
<syntaxhighlight lang="promql">
// string
# string
"foo"
"foo"
`foo`
`foo`


// string-literal
# string-literal
'foo\nbar'
'foo\nbar'
</syntaxhighlight>
</syntaxhighlight>
Line 42: Line 43:
== Floats ==
== Floats ==
<blockquote>
<blockquote>
<syntaxhighlight lang="go">
<syntaxhighlight lang="promql">
23
23
-2.43
-2.43
Line 53: Line 54:
</blockquote><!-- Datatypes -->
</blockquote><!-- Datatypes -->


= Queries =
= Metric Datatypes =
<blockquote>
<syntaxhighlight lang="promql">
# scalar:  (floats)
23
2.43
 
# instant-vector: (still floats, but with the capacity to also carry metric labels)
my_metric 1.2
my_metric{hostname="foobar", user="baz"}  1.2
 
# range-vectors: (arrays of instant-vectors -- a window of metrics)
my_metric[10m]
</syntaxhighlight>
</blockquote><!-- Metric Datatypes -->
 
= Metric-Selectors =
<blockquote>
<blockquote>
== Metric-Selector ==
== Basics ==
<blockquote>
<blockquote>
The most basic query you can use is <code>your_metric_name</code> which queries all samples for that metric.<br>
The most basic query you can use is <code>your_metric_name</code> which queries all samples for that metric.<br>
These metric selectors can be composed and filtered.
These metric selectors can be composed and filtered.


<syntaxhighlight lang="go">
<syntaxhighlight lang="promql">
{__name__="your_metric_name"}            // query all (you can match multiple metrics this way)
{__name__="your_metric_name"}            # query all metrics (you can match multiple metrics this way)
sum by(device) ({device =~ ".+"})        # query all distinct tag values for a tag (here, 'device' is tag)


your_metric_name                        // query all
your_metric_name                        # query all
your_metric_name[5min]                  // lump data into 5min clumps
your_metric_name[5min]                  # lump data into 5min clumps
your_metric_name{job="foo",group="bar"}  // filter by metric-labels
your_metric_name{job="foo",group="bar"}  # filter by metric-labels
</syntaxhighlight>
</syntaxhighlight>
</blockquote><!-- Metric-Selector -->
</blockquote><!-- Metric-Selector -->
Line 72: Line 90:
<blockquote>
<blockquote>
label metrics support various matchers/operators
label metrics support various matchers/operators
<syntaxhighlight lang="go">
<syntaxhighlight lang="promql">
// equal
# equal
!=  // not-equal
!=  # not-equal
=~  // regex match
=~  # regex match
!~  // not regex match
!~  # not regex match
</syntaxhighlight>
</syntaxhighlight>
</blockquote><!-- Operators -->
</blockquote><!-- Operators -->
Line 82: Line 100:
== Clustering Metrics ==
== Clustering Metrics ==
<blockquote>
<blockquote>
{{ TODO |
<syntaxhighlight lang="promql">
are these aggregates averages? sums? greatest-value?
your_metric_name[5min]  # lump data into 5min clumps
}}
<syntaxhighlight lang="go">
your_metric_name[5min]  // lump data into 5min clumps
</syntaxhighlight>
</syntaxhighlight>


Units
Units
<syntaxhighlight lang="go">
<syntaxhighlight lang="promql">
ms  // milliseconds
ms  # milliseconds
// seconds
# seconds
// minutes
# minutes
// hours
# hours
// days
# days
// weeks
# weeks
// years
# years
</syntaxhighlight>
</syntaxhighlight>
</blockquote><!-- Clustering Metrics -->
</blockquote><!-- Clustering Metrics -->
Line 103: Line 118:
== Query Time Ranges ==
== Query Time Ranges ==
<blockquote>
<blockquote>
<syntaxhighlight lang="go">
<syntaxhighlight lang="promql">
your_metric_name offset 5min  // query 5min-ago until present
your_metric_name offset 5min  # query 5min-ago until present


your_metric_name @ 1609746000 // query at exactly '2021-01-04T07:40:00+00:00'
your_metric_name @ 1609746000 # query at exactly '2021-01-04T07:40:00+00:00'
</syntaxhighlight>
</syntaxhighlight>
</blockquote><!-- Offset Time Ranges -->
</blockquote><!-- Offset Time Ranges -->
</blockquote><!-- Metric Selectors -->
= Queries =
<blockquote>
== Basics ==
<blockquote>
A simple query is simply a metric-selector with an optional filter
<syntaxhighlight lang="promql">
your_metric_name{job="foo"}
</syntaxhighlight>
</blockquote><!-- Basics -->
== SubQueries ==
<blockquote>
You can combine functions and metrics.<br>
From official examples:
<syntaxhighlight lang="promql">
rate(http_requests_total[5m])[30m:1m]
</syntaxhighlight>
</blockquote><!-- SubQueries -->
== Metric Operators ==
<blockquote>
You can do simple math using metrics.<br>
There is some type-trickiness here, see [https://prometheus.io/docs/prometheus/latest/querying/operators/ docs] for details.
<syntaxhighlight lang="promql">
metric_start - metric_end
</syntaxhighlight>
math
<syntaxhighlight lang="promql">
+  # add
-  # subtract
*  # multiplication
/  # division
%  # modulo
^  # exponent
</syntaxhighlight>
</blockquote><!-- Operators -->
== Metric Matching ==
<blockquote>
You can join metrics using labels.
<syntaxhighlight lang="promql">
metric_1 and    metric_2  # only elements of metric_1 with exactly matching label-sets in metric_2
metric_1 or      metric_2  # all elements of metric_1, and elements of metric_2 with non-matching labels
metric_1 unless  metric_2  # only elements of metric_1, where there are no matching label-sets in metric_2
</syntaxhighlight>
You can also filter which labels you want to match for an operation.
<syntaxhighlight lang="promql">
on(label, label, ...)        # when joining metrics, only look at these labels
ignoring(label, label, ...)  # when joining metrics, look at all labels except these
# you can use these for any operator
metric_1 * on(my_label) metric_2  # multiply metric_1 and metric_2, where metric_1/2's 'my_label' value is the same.
</syntaxhighlight>
Dividing two metrics with identical label sets produces error: <code>vector cannot contain metrics with the same labelset</code><br>
avoid it like this
<syntaxhighlight lang="promql">
sum by (city) (rate (births{}[2h]))
/
sum by (city) (rate (deceased{}[2h]))
</syntaxhighlight>
'''Examples'''
<blockquote>
return filesystem usage,<br>
for servers 'my-server' and 'my-other-server' only.
<syntaxhighlight lang="promql">
(node_filesystem_avail_bytes /  node_filesystem_size_bytes) > 0.75  # calculate used bytes
  and on(instance)                                                  # inner-join where instance matches
    node_uname_info{nodename=~"my-server|my-other-server"}          # metric, scoped only to my-server, my-other-server (used to filter instances)
</syntaxhighlight>
You can also re-use this pattern of filtering with much larger queries.
<syntaxhighlight lang="promql">
# where filesystem-usage is > 75%
(node_filesystem_avail_bytes /  node_filesystem_size_bytes) > 0.75
  # join where both instance/mountpoint match
  and on(instance, mountpoint)
  (
    # where any of:
    #  mountpoint in (/var/log, /var/audit) and nodename="server1"
    #  mountpoint in (/usr/ports, /)        and nodename="server2"
    #  mountpoint = /var/mail              and nodename="server3"
    #
    (node_filesystem_avail_bytes{mountpoint=~"/var/log|/var/audit"}
    and on(instance) node_uname_info{nodename="server1"})
    or (node_filesystem_avail_bytes{mountpoint=~"/usr/ports|/"}
        and on(instance) node_uname_info{nodename="server2"})
    or (node_filesystem_avail_bytes{mountpoint="/var/mail"}
        and on(instance) node_uname_info{nodename="server3"})
  )
</syntaxhighlight>
</blockquote>
</blockquote><!-- Matching -->
== Aggregates ==
<blockquote>
<syntaxhighlight lang="promql">
sum(your_metric_name)                      # aggregate function
sum without (duration) (your_metric_name)  # excludes 'duration' labels from sum
sum by (job, duration) (your_metric_name)  # group sums by label 'job' and 'duration'
</syntaxhighlight>
if you need a fixed count over a time range, you can use
<source lang="promql">
sum(increase(some_metric[10m]))  #
</source>
<syntaxhighlight lang="promql">
# simple
sum          # sum elements
min          # smallest of elements
max          # largest of elements
avg          # average of elements
count        # num of elements
count_values  # num elements with same value
# complex
group
stddev
stdvar
bottomk
topk
quantile
</syntaxhighlight>
</blockquote><!-- Aggregates -->
== Functions ==
<blockquote>
There are several [https://prometheus.io/docs/prometheus/latest/querying/functions/ builtin functions].
<syntaxhighlight lang="promql">
# num LBAs read is very high, looks like flat line.
# measuring the rate of change shows spikes in usage.
rate(node_smartctl_total_lbas_read_raw[10m])
# build new label on metric from value
label_replace(some_metric{}, "new_label", "$1", "old_label", "^(regex-capture)-on-old-value")
# label-replace for range-vectors
label_replace(some_metric{}, "new_label", "$1", "old_label", "^(regex-capture)-on-old-value")[$__range:]  # note ':' at end for range
# label-replace for empty vectors (ex. 'foo{} or on() vector(0)' to sub in a 0 value)
label_replace(vector(0), "my_new_label", "my_value", "", "")
</syntaxhighlight>
</blockquote><!-- Functions -->
</blockquote><!-- Queries -->
</blockquote><!-- Queries -->

Latest revision as of 17:11, 30 November 2023

PromQL is prometheus's query language.
It's syntax is inspired by golang.
You can query prometheus from

  • HTTP API
  • UI table/graph view

Documentation

official docs https://prometheus.io/docs/prometheus/latest/querying/basics/
official examples https://prometheus.io/docs/prometheus/latest/querying/examples/
builtin functions https://prometheus.io/docs/prometheus/latest/querying/functions/
re2 (regex engine) https://github.com/google/re2/wiki/Syntax

Comments

# a comment

Datatypes

Strings

# string
"foo"
`foo`

# string-literal
'foo\nbar'

Floats

23
-2.43
3.4e-9
0x8f
-Inf
NaN

Metric Datatypes

# scalar:  (floats)
23
2.43

# instant-vector: (still floats, but with the capacity to also carry metric labels)
my_metric 1.2
my_metric{hostname="foobar", user="baz"}  1.2

# range-vectors: (arrays of instant-vectors -- a window of metrics)
my_metric[10m]

Metric-Selectors

Basics

The most basic query you can use is your_metric_name which queries all samples for that metric.
These metric selectors can be composed and filtered.

{__name__="your_metric_name"}            # query all metrics (you can match multiple metrics this way)
sum by(device) ({device =~ ".+"})        # query all distinct tag values for a tag (here, 'device' is tag)

your_metric_name                         # query all
your_metric_name[5min]                   # lump data into 5min clumps
your_metric_name{job="foo",group="bar"}  # filter by metric-labels

Operators

label metrics support various matchers/operators

=   # equal
!=  # not-equal
=~  # regex match
!~  # not regex match

Clustering Metrics

your_metric_name[5min]   # lump data into 5min clumps

Units

ms  # milliseconds
s   # seconds
m   # minutes
h   # hours
d   # days
w   # weeks
y   # years

Query Time Ranges

your_metric_name offset 5min   # query 5min-ago until present

your_metric_name @ 1609746000 # query at exactly '2021-01-04T07:40:00+00:00'

Queries

Basics

A simple query is simply a metric-selector with an optional filter

your_metric_name{job="foo"}

SubQueries

You can combine functions and metrics.
From official examples:

rate(http_requests_total[5m])[30m:1m]

Metric Operators

You can do simple math using metrics.
There is some type-trickiness here, see docs for details.

metric_start - metric_end

math

+  # add
-  # subtract
*  # multiplication
/  # division
%  # modulo
^  # exponent

Metric Matching

You can join metrics using labels.

metric_1 and     metric_2  # only elements of metric_1 with exactly matching label-sets in metric_2
metric_1 or      metric_2  # all elements of metric_1, and elements of metric_2 with non-matching labels
metric_1 unless  metric_2  # only elements of metric_1, where there are no matching label-sets in metric_2

You can also filter which labels you want to match for an operation.

on(label, label, ...)        # when joining metrics, only look at these labels
ignoring(label, label, ...)  # when joining metrics, look at all labels except these

# you can use these for any operator
metric_1 * on(my_label) metric_2  # multiply metric_1 and metric_2, where metric_1/2's 'my_label' value is the same.

Dividing two metrics with identical label sets produces error: vector cannot contain metrics with the same labelset
avoid it like this

sum by (city) (rate (births{}[2h]))
/
sum by (city) (rate (deceased{}[2h]))

Examples

return filesystem usage,
for servers 'my-server' and 'my-other-server' only.

(node_filesystem_avail_bytes /  node_filesystem_size_bytes) > 0.75  # calculate used bytes
  and on(instance)                                                  # inner-join where instance matches
    node_uname_info{nodename=~"my-server|my-other-server"}          # metric, scoped only to my-server, my-other-server (used to filter instances)

You can also re-use this pattern of filtering with much larger queries.

# where filesystem-usage is > 75%
(node_filesystem_avail_bytes /  node_filesystem_size_bytes) > 0.75

  # join where both instance/mountpoint match
  and on(instance, mountpoint)
  (
    # where any of:
    #  mountpoint in (/var/log, /var/audit) and nodename="server1"
    #  mountpoint in (/usr/ports, /)        and nodename="server2"
    #  mountpoint = /var/mail               and nodename="server3"
    #
    (node_filesystem_avail_bytes{mountpoint=~"/var/log|/var/audit"}
     and on(instance) node_uname_info{nodename="server1"})

    or (node_filesystem_avail_bytes{mountpoint=~"/usr/ports|/"}
        and on(instance) node_uname_info{nodename="server2"})

    or (node_filesystem_avail_bytes{mountpoint="/var/mail"}
        and on(instance) node_uname_info{nodename="server3"})
  )

Aggregates

sum(your_metric_name)                      # aggregate function
sum without (duration) (your_metric_name)  # excludes 'duration' labels from sum
sum by (job, duration) (your_metric_name)  # group sums by label 'job' and 'duration'

if you need a fixed count over a time range, you can use

sum(increase(some_metric[10m]))  #
# simple
sum           # sum elements
min           # smallest of elements
max           # largest of elements
avg           # average of elements
count         # num of elements
count_values  # num elements with same value

# complex
group
stddev
stdvar
bottomk
topk
quantile

Functions

There are several builtin functions.

# num LBAs read is very high, looks like flat line.
# measuring the rate of change shows spikes in usage.
rate(node_smartctl_total_lbas_read_raw[10m])

# build new label on metric from value
label_replace(some_metric{}, "new_label", "$1", "old_label", "^(regex-capture)-on-old-value") 

# label-replace for range-vectors
label_replace(some_metric{}, "new_label", "$1", "old_label", "^(regex-capture)-on-old-value")[$__range:]  # note ':' at end for range

# label-replace for empty vectors (ex. 'foo{} or on() vector(0)' to sub in a 0 value)
label_replace(vector(0), "my_new_label", "my_value", "", "")