Prometheus rules: Difference between revisions
From wikinotes
(Created page with "= Documentation = <blockquote> {| class="wikitable" |- | official docs || https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/ |- |} </blockquote><!-- D...") |
No edit summary |
||
(14 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Rules can be used to create new metrics, or to trigger alerts. | |||
= Documentation = | = Documentation = | ||
<blockquote> | <blockquote> | ||
Line 8: | Line 10: | ||
</blockquote><!-- Documentation --> | </blockquote><!-- Documentation --> | ||
= | = Best Practices = | ||
<blockquote> | |||
* rule-files can include multiple rules with increasing severity | |||
* rule-files can use templating, so rules can be applied to multiple metrics at once. | |||
* rule-files should pertain to a single type of test | |||
</blockquote><!-- Best Practices --> | |||
= Syntax = | |||
<blockquote> | |||
== Recording Rules (custom metrics) == | |||
<blockquote> | <blockquote> | ||
<syntaxhighlight lang="yaml"> | <syntaxhighlight lang="yaml"> | ||
Line 16: | Line 27: | ||
- name: available_memory | - name: available_memory | ||
rules: | rules: | ||
- record: node_available_memory_percent | - record: node_available_memory_percent # metric rule-result is exposed as | ||
expr: | expr: node_memory_free_bytes / node_memory_size_bytes # PromQL query rule performs | ||
# | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 29: | Line 35: | ||
rule_files: | rule_files: | ||
- "memory_rules.yml" | - "alertmanager/rules/memory_rules.yml" # relative-path from prometheus.yml | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 35: | Line 41: | ||
service prometheus restart | service prometheus restart | ||
</syntaxhighlight> | </syntaxhighlight> | ||
</blockquote><!-- | |||
now <code>node_available_memory_percent</code> metric should be cached and queryable. | |||
</blockquote><!-- Custom Metric --> | |||
== Alert Rules == | |||
<blockquote> | |||
<syntaxhighlight lang="yaml"> | |||
# /usr/local/etc/memory_alerts.yml | |||
groups: | |||
- name: available_memory | |||
rules: | |||
- alert: sustained_high_memory_usage | |||
expr: (node_memory_free_bytes / node_memory_size_bytes) > 0.9 | |||
for: 1h | |||
labels: | |||
severity: warning | |||
annotations: | |||
summary: "host {{ $labels.instance }} has sustained high memory usage" | |||
description: "{{ $labels.instance }} has had >=90% memory usage for at least an hour!" | |||
</syntaxhighlight> | |||
</blockquote><!-- Alert Rules --> | |||
</blockquote><!-- Syntax --> | |||
= Validation = | |||
<blockquote> | |||
== Web UI == | |||
<blockquote> | |||
Rules currently observed by prometheus should show up in the web-ui under alerts (even if they are passing). | |||
</blockquote><!-- Web UI --> | |||
</blockquote><!-- Validation --> |
Latest revision as of 21:53, 19 February 2022
Rules can be used to create new metrics, or to trigger alerts.
Documentation
official docs https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
Best Practices
- rule-files can include multiple rules with increasing severity
- rule-files can use templating, so rules can be applied to multiple metrics at once.
- rule-files should pertain to a single type of test
Syntax
Recording Rules (custom metrics)
# /usr/local/etc/memory_rules.yml groups: - name: available_memory rules: - record: node_available_memory_percent # metric rule-result is exposed as expr: node_memory_free_bytes / node_memory_size_bytes # PromQL query rule performs# /usr/local/etc/prometheus.yml rule_files: - "alertmanager/rules/memory_rules.yml" # relative-path from prometheus.ymlservice prometheus restartnow
node_available_memory_percent
metric should be cached and queryable.Alert Rules
# /usr/local/etc/memory_alerts.yml groups: - name: available_memory rules: - alert: sustained_high_memory_usage expr: (node_memory_free_bytes / node_memory_size_bytes) > 0.9 for: 1h labels: severity: warning annotations: summary: "host {{ $labels.instance }} has sustained high memory usage" description: "{{ $labels.instance }} has had >=90% memory usage for at least an hour!"
Validation
Web UI
Rules currently observed by prometheus should show up in the web-ui under alerts (even if they are passing).