Alertmanager configuration

From wikinotes

Documentation

official docs https://prometheus.io/docs/alerting/latest/configuration/
sample config https://github.com/prometheus/alertmanager#example

Locations

${PREFIX}/etc/alertmanager/alertmanager.yml alertmanager config
${PREFIX}/prometheus.yml prometheus config

Prometheus

Prometheus is responsible for creating the rules that issue the alerts.

Config

Configure prometheus with alert-rules, and to issue alerts to alertmanager.

# /usr/local/etc/prometheus.yml

# issue alerts to alertmanager at localhost:9093
alerting:
  alertmanagers:
    - static_configs:
      - targets: ['localhost:9093']

# load alert-rules defined in 'memory_rules.yml'
rule_files:
  - "memory_rules.yml"

Alert Rules

Configure an alert-rule (see prometheus rules for more details).

# /usr/local/etc/memory_rules.yml

groups:
  - name: available_memory
    rules:
    - alert: sustained_high_memory_usage
      expr:  (node_memory_free_bytes / node_memory_size_bytes) > 0.9
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "host {{ $labels.instance }} has sustained high memory usage"
        description: "{{ $labels.instance }} has had >=90% memory usage for at least an hour!"

AlertManager

Alertmanager is responsible for consuming fired-alerts, and deciding who/how/if to alert.

Overview

AlertManager can issue notifications using various methods.
See docs for all options (ex. email, http, pagerduty, slack, ...)

# /usr/local/etc/alertmanager/alertmanager.yml

global:          # general settings
route:           # root-route, where alerts enter
receivers:       # alerts are issued to receivers

templates:       # configure template locations (templates for alert messages)
inhibit_rules:   # rules to mute alerts, when other alerts are already firing

Routes

Routes determine when/how notifications are sent, and who they are sent to.

# /usr/local/etc/alertmanager/alertmanager.yml

route:
  receiver: team-X-mails              # default receiver for all routes
  group_by: ['cluster', 'alertname']  # alerts batched by labels. one alert fired per-batch at a time.
  repeat_interval: 3h                 # re-issue alert after this time-interval if not resolved

  # optionally, you can match on alert-labels
  # and alter the alert/receiver
  # (can be nested for gradually more specific rules)
  routes:
    - match:
        service=~"ha|nginx|wsgi"      # if alert's label matches regex, issue to this receiver
      receiver: team-X-mails
      routes:                         # you can recursively nest routes, for more specific label matches
        - match:
            severity="critical"
          receiver: team-X-pager
    - match:
      # ...

Receivers

Receivers are used in routes, and represent a communication-method for issuing an alert.
ex: email, slack, pagerduty, ...

# /usr/local/etc/alertmanager/alertmanager.yml

receivers:
  # send email
  - name: 'team-X-mails'
    email_configs:
      - to: 'team-X+alerts@example.org'

  # send email AND page w/ pagerduty
  - name: 'team-X-pager'
    email_configs:
      - to: 'team-X+alerts-critical@example.org'
    pagerduty_configs:
      - service_key: 'abcdefg'