Alertmanager configuration
From wikinotes
Documentation
official docs https://prometheus.io/docs/alerting/latest/configuration/ sample config https://github.com/prometheus/alertmanager#example
Locations
${PREFIX}/etc/alertmanager/alertmanager.yml
alertmanager config ${PREFIX}/prometheus.yml
prometheus config
Prometheus
Prometheus is responsible for creating the rules that issue the alerts.
Config
Configure prometheus with alert-rules, and to issue alerts to alertmanager.
# /usr/local/etc/prometheus.yml # issue alerts to alertmanager at localhost:9093 alerting: alertmanagers: - static_configs: - targets: ['localhost:9093'] # load alert-rules defined in 'memory_rules.yml' rule_files: - "memory_rules.yml"Alert Rules
Configure an alert-rule (see prometheus rules for more details).
# /usr/local/etc/memory_rules.yml groups: - name: available_memory rules: - alert: sustained_high_memory_usage expr: (node_memory_free_bytes / node_memory_size_bytes) > 0.9 for: 1h labels: severity: warning annotations: summary: "host {{ $labels.instance }} has sustained high memory usage" description: "{{ $labels.instance }} has had >=90% memory usage for at least an hour!"
AlertManager
Alertmanager is responsible for consuming fired-alerts, and deciding who/how/if to alert.
Overview
AlertManager can issue notifications using various methods.
See docs for all options (ex. email, http, pagerduty, slack, ...)# /usr/local/etc/alertmanager/alertmanager.yml global: # general settings route: # root-route, where alerts enter receivers: # alerts are issued to receivers templates: # configure template locations (templates for alert messages) inhibit_rules: # rules to mute alerts, when other alerts are already firingRoutes
Routes determine when/how notifications are sent, and who they are sent to.
# /usr/local/etc/alertmanager/alertmanager.yml route: receiver: team-X-mails # default receiver for all routes group_by: ['cluster', 'alertname'] # alerts batched by labels. one alert fired per-batch at a time. repeat_interval: 3h # re-issue alert after this time-interval if not resolved # optionally, you can match on alert-labels # and alter the alert/receiver # (can be nested for gradually more specific rules) routes: - match: service=~"ha|nginx|wsgi" # if alert's label matches regex, issue to this receiver receiver: team-X-mails routes: # you can recursively nest routes, for more specific label matches - match: severity="critical" receiver: team-X-pager - match: # ...Receivers
Receivers are used in
routes
, and represent a communication-method for issuing an alert.
ex: email, slack, pagerduty, ...# /usr/local/etc/alertmanager/alertmanager.yml receivers: # send email - name: 'team-X-mails' email_configs: - to: 'team-X+alerts@example.org' # send email AND page w/ pagerduty - name: 'team-X-pager' email_configs: - to: 'team-X+alerts-critical@example.org' pagerduty_configs: - service_key: 'abcdefg'