Prometheus promql

From wikinotes
Revision as of 18:34, 23 February 2023 by Will (talk | contribs) (→‎Aggregates)

PromQL is prometheus's query language.
It's syntax is inspired by golang.
You can query prometheus from

  • HTTP API
  • UI table/graph view

Documentation

official docs https://prometheus.io/docs/prometheus/latest/querying/basics/
official examples https://prometheus.io/docs/prometheus/latest/querying/examples/
builtin functions https://prometheus.io/docs/prometheus/latest/querying/functions/
re2 (regex engine) https://github.com/google/re2/wiki/Syntax

Comments

# a comment

Datatypes

Strings

# string
"foo"
`foo`

# string-literal
'foo\nbar'

Floats

23
-2.43
3.4e-9
0x8f
-Inf
NaN

Metric Datatypes

# scalar:  (floats)
23
2.43

# instant-vector: (still floats, but with the capacity to also carry metric labels)
my_metric 1.2
my_metric{hostname="foobar", user="baz"}  1.2

# range-vectors: (arrays of instant-vectors -- a window of metrics)
my_metric[10m]

Metric-Selectors

Basics

The most basic query you can use is your_metric_name which queries all samples for that metric.
These metric selectors can be composed and filtered.

{__name__="your_metric_name"}            # query all metrics (you can match multiple metrics this way)
sum by(device) ({device =~ ".+"})        # query all distinct tag values for a tag (here, 'device' is tag)

your_metric_name                         # query all
your_metric_name[5min]                   # lump data into 5min clumps
your_metric_name{job="foo",group="bar"}  # filter by metric-labels

Operators

label metrics support various matchers/operators

=   # equal
!=  # not-equal
=~  # regex match
!~  # not regex match

Clustering Metrics

your_metric_name[5min]   # lump data into 5min clumps

Units

ms  # milliseconds
s   # seconds
m   # minutes
h   # hours
d   # days
w   # weeks
y   # years

Query Time Ranges

your_metric_name offset 5min   # query 5min-ago until present

your_metric_name @ 1609746000 # query at exactly '2021-01-04T07:40:00+00:00'

Queries

Basics

A simple query is simply a metric-selector with an optional filter

your_metric_name{job="foo"}

SubQueries

You can combine functions and metrics.
From official examples:

rate(http_requests_total[5m])[30m:1m]

Metric Operators

You can do simple math using metrics.
There is some type-trickiness here, see docs for details.

metric_start - metric_end

math

+  # add
-  # subtract
*  # multiplication
/  # division
%  # modulo
^  # exponent

Metric Matching

You can join metrics using labels.

metric_1 and     metric_2  # only elements of metric_1 with exactly matching label-sets in metric_2
metric_1 or      metric_2  # all elements of metric_1, and elements of metric_2 with non-matching labels
metric_1 unless  metric_2  # only elements of metric_1, where there are no matching label-sets in metric_2

You can also filter which labels you want to match for an operation.

on(label, label, ...)        # when joining metrics, only look at these labels
ignoring(label, label, ...)  # when joining metrics, look at all labels except these

# you can use these for any operator
metric_1 * on(my_label) metric_2  # multiply metric_1 and metric_2, where metric_1/2's 'my_label' value is the same.


Examples

return filesystem usage,
for servers 'my-server' and 'my-other-server' only.

(node_filesystem_avail_bytes /  node_filesystem_size_bytes) > 0.75  # calculate used bytes
  and on(instance)                                                  # inner-join where instance matches
    node_uname_info{nodename=~"my-server|my-other-server"}          # metric, scoped only to my-server, my-other-server (used to filter instances)

You can also re-use this pattern of filtering with much larger queries.

# where filesystem-usage is > 75%
(node_filesystem_avail_bytes /  node_filesystem_size_bytes) > 0.75

  # join where both instance/mountpoint match
  and on(instance, mountpoint)
  (
    # where any of:
    #  mountpoint in (/var/log, /var/audit) and nodename="server1"
    #  mountpoint in (/usr/ports, /)        and nodename="server2"
    #  mountpoint = /var/mail               and nodename="server3"
    #
    (node_filesystem_avail_bytes{mountpoint=~"/var/log|/var/audit"}
     and on(instance) node_uname_info{nodename="server1"})

    or (node_filesystem_avail_bytes{mountpoint=~"/usr/ports|/"}
        and on(instance) node_uname_info{nodename="server2"})

    or (node_filesystem_avail_bytes{mountpoint="/var/mail"}
        and on(instance) node_uname_info{nodename="server3"})
  )

Aggregates

sum(your_metric_name)                      # aggregate function
sum without (duration) (your_metric_name)  # excludes 'duration' labels from sum
sum by (job, duration) (your_metric_name)  # group sums by label 'job' and 'duration'

if you need a fixed count over a time range, you can use

sum(increase(some_metric[10m]))  #
# simple
sum           # sum elements
min           # smallest of elements
max           # largest of elements
avg           # average of elements
count         # num of elements
count_values  # num elements with same value

# complex
group
stddev
stdvar
bottomk
topk
quantile

Functions

There are several builtin functions.

# num LBAs read is very high, looks like flat line.
# measuring the rate of change shows spikes in usage.
rate(node_smartctl_total_lbas_read_raw[10m])