Prometheus promql
PromQL is prometheus's query language.
It's syntax is inspired by golang.
You can query prometheus from
- HTTP API
- UI table/graph view
Documentation
official docs https://prometheus.io/docs/prometheus/latest/querying/basics/ official examples https://prometheus.io/docs/prometheus/latest/querying/examples/ builtin functions https://prometheus.io/docs/prometheus/latest/querying/functions/ re2 (regex engine) https://github.com/google/re2/wiki/Syntax
Comments
# a comment
Datatypes
Strings
# string "foo" `foo` # string-literal 'foo\nbar'Floats
23 -2.43 3.4e-9 0x8f -Inf NaN
Metric Datatypes
# scalar: (floats) 23 2.43 # instant-vector: (still floats, but with the capacity to also carry metric labels) my_metric 1.2 my_metric{hostname="foobar", user="baz"} 1.2 # range-vectors: (arrays of instant-vectors -- a window of metrics) my_metric[10m]
Metric-Selectors
Basics
The most basic query you can use is
your_metric_name
which queries all samples for that metric.
These metric selectors can be composed and filtered.{__name__="your_metric_name"} # query all metrics (you can match multiple metrics this way) sum by(device) ({device =~ ".+"}) # query all distinct tag values for a tag (here, 'device' is tag) your_metric_name # query all your_metric_name[5min] # lump data into 5min clumps your_metric_name{job="foo",group="bar"} # filter by metric-labelsOperators
label metrics support various matchers/operators
= # equal != # not-equal =~ # regex match !~ # not regex matchClustering Metrics
your_metric_name[5min] # lump data into 5min clumpsUnits
ms # milliseconds s # seconds m # minutes h # hours d # days w # weeks y # yearsQuery Time Ranges
your_metric_name offset 5min # query 5min-ago until present your_metric_name @ 1609746000 # query at exactly '2021-01-04T07:40:00+00:00'
Queries
Basics
A simple query is simply a metric-selector with an optional filter
your_metric_name{job="foo"}SubQueries
You can combine functions and metrics.
From official examples:rate(http_requests_total[5m])[30m:1m]Metric Operators
You can do simple math using metrics.
There is some type-trickiness here, see docs for details.metric_start - metric_endmath
+ # add - # subtract * # multiplication / # division % # modulo ^ # exponentMetric Matching
You can join metrics using labels.
metric_1 and metric_2 # only elements of metric_1 with exactly matching label-sets in metric_2 metric_1 or metric_2 # all elements of metric_1, and elements of metric_2 with non-matching labels metric_1 unless metric_2 # only elements of metric_1, where there are no matching label-sets in metric_2You can also filter which labels you want to match for an operation.
on(label, label, ...) # when joining metrics, only look at these labels ignoring(label, label, ...) # when joining metrics, look at all labels except these # you can use these for any operator metric_1 * on(my_label) metric_2 # multiply metric_1 and metric_2, where metric_1/2's 'my_label' value is the same.Dividing two metrics with identical label sets produces error:
vector cannot contain metrics with the same labelset
avoid it like thissum by (city) (rate (births{}[2h])) / sum by (city) (rate (deceased{}[2h]))Examples
return filesystem usage,
for servers 'my-server' and 'my-other-server' only.(node_filesystem_avail_bytes / node_filesystem_size_bytes) > 0.75 # calculate used bytes and on(instance) # inner-join where instance matches node_uname_info{nodename=~"my-server|my-other-server"} # metric, scoped only to my-server, my-other-server (used to filter instances)You can also re-use this pattern of filtering with much larger queries.
# where filesystem-usage is > 75% (node_filesystem_avail_bytes / node_filesystem_size_bytes) > 0.75 # join where both instance/mountpoint match and on(instance, mountpoint) ( # where any of: # mountpoint in (/var/log, /var/audit) and nodename="server1" # mountpoint in (/usr/ports, /) and nodename="server2" # mountpoint = /var/mail and nodename="server3" # (node_filesystem_avail_bytes{mountpoint=~"/var/log|/var/audit"} and on(instance) node_uname_info{nodename="server1"}) or (node_filesystem_avail_bytes{mountpoint=~"/usr/ports|/"} and on(instance) node_uname_info{nodename="server2"}) or (node_filesystem_avail_bytes{mountpoint="/var/mail"} and on(instance) node_uname_info{nodename="server3"}) )Aggregates
sum(your_metric_name) # aggregate function sum without (duration) (your_metric_name) # excludes 'duration' labels from sum sum by (job, duration) (your_metric_name) # group sums by label 'job' and 'duration'if you need a fixed count over a time range, you can use
sum(increase(some_metric[10m])) ## simple sum # sum elements min # smallest of elements max # largest of elements avg # average of elements count # num of elements count_values # num elements with same value # complex group stddev stdvar bottomk topk quantileFunctions
There are several builtin functions.
# num LBAs read is very high, looks like flat line. # measuring the rate of change shows spikes in usage. rate(node_smartctl_total_lbas_read_raw[10m]) # build new label on metric from value label_replace(some_metric{}, "new_label", "$1", "old_label", "^(regex-capture)-on-old-value") # label-replace for range-vectors label_replace(some_metric{}, "new_label", "$1", "old_label", "^(regex-capture)-on-old-value")[$__range:] # note ':' at end for range # label-replace for empty vectors (ex. 'foo{} or on() vector(0)' to sub in a 0 value) label_replace(vector(0), "my_new_label", "my_value", "", "")