Prometheus rules: Difference between revisions

From wikinotes
(Created page with "= Documentation = <blockquote> {| class="wikitable" |- | official docs || https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/ |- |} </blockquote><!-- D...")
 
No edit summary
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
Rules can be used to create new metrics, or to trigger alerts.
= Documentation =
= Documentation =
<blockquote>
<blockquote>
Line 8: Line 10:
</blockquote><!-- Documentation -->
</blockquote><!-- Documentation -->


= Example =
= Best Practices =
<blockquote>
* rule-files can include multiple rules with increasing severity
* rule-files can use templating, so rules can be applied to multiple metrics at once.
* rule-files should pertain to a single type of test
</blockquote><!-- Best Practices -->
 
= Syntax =
<blockquote>
== Recording Rules (custom metrics) ==
<blockquote>
<blockquote>
<syntaxhighlight lang="yaml">
<syntaxhighlight lang="yaml">
Line 16: Line 27:
   - name: available_memory
   - name: available_memory
     rules:
     rules:
     - record: node_available_memory_percent
     - record: node_available_memory_percent                   # metric rule-result is exposed as
       expr:  100 * (node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
       expr:  node_memory_free_bytes / node_memory_size_bytes  # PromQL query rule performs
</syntaxhighlight>
 
<syntaxhighlight lang="bash">
# confirm syntax OK
promtool check rules memory_rules.yml
</syntaxhighlight>
</syntaxhighlight>


Line 29: Line 35:


rule_files:
rule_files:
   - "memory_rules.yml"
   - "alertmanager/rules/memory_rules.yml" # relative-path from prometheus.yml
</syntaxhighlight>
</syntaxhighlight>


Line 35: Line 41:
service prometheus restart
service prometheus restart
</syntaxhighlight>
</syntaxhighlight>
</blockquote><!-- Example -->
 
now <code>node_available_memory_percent</code> metric should be cached and queryable.
</blockquote><!-- Custom Metric -->
 
== Alert Rules ==
<blockquote>
<syntaxhighlight lang="yaml">
# /usr/local/etc/memory_alerts.yml
 
groups:
  - name: available_memory
    rules:
    - alert: sustained_high_memory_usage
      expr:  (node_memory_free_bytes / node_memory_size_bytes) > 0.9
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "host {{ $labels.instance }} has sustained high memory usage"
        description: "{{ $labels.instance }} has had >=90% memory usage for at least an hour!"
</syntaxhighlight>
</blockquote><!-- Alert Rules -->
</blockquote><!-- Syntax -->
 
= Validation =
<blockquote>
== Web UI ==
<blockquote>
Rules currently observed by prometheus should show up in the web-ui under alerts (even if they are passing).
</blockquote><!-- Web UI -->
</blockquote><!-- Validation -->

Latest revision as of 21:53, 19 February 2022

Rules can be used to create new metrics, or to trigger alerts.

Documentation

official docs https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

Best Practices

  • rule-files can include multiple rules with increasing severity
  • rule-files can use templating, so rules can be applied to multiple metrics at once.
  • rule-files should pertain to a single type of test

Syntax

Recording Rules (custom metrics)

# /usr/local/etc/memory_rules.yml

groups:
  - name: available_memory
    rules:
    - record: node_available_memory_percent                    # metric rule-result is exposed as
      expr:   node_memory_free_bytes / node_memory_size_bytes  # PromQL query rule performs
# /usr/local/etc/prometheus.yml

rule_files:
  - "alertmanager/rules/memory_rules.yml"  # relative-path from prometheus.yml
service prometheus restart

now node_available_memory_percent metric should be cached and queryable.

Alert Rules

# /usr/local/etc/memory_alerts.yml

groups:
  - name: available_memory
    rules:
    - alert: sustained_high_memory_usage
      expr:  (node_memory_free_bytes / node_memory_size_bytes) > 0.9
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "host {{ $labels.instance }} has sustained high memory usage"
        description: "{{ $labels.instance }} has had >=90% memory usage for at least an hour!"

Validation

Web UI

Rules currently observed by prometheus should show up in the web-ui under alerts (even if they are passing).