未生效原因及配置的主要点: rules file中的内容是会全部显示到报警所发的内容中,在slack发送中的对link的配置是指在slack中显示报警时可以直接让关注的报警接收人点击链接进入到报警发生的位置或者你想让他看的位置对于rule file中的username是可以用中文在alertmanager.yml中关于slack的配置,api_url不加引号,channel 那么必须是指定的,否则会报错,错误如下
level=error ts=2018-10-19T08:42:36.63691218Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="cancelling noretry for "slack" due to unrecoverable error: unexpected status code 404" 1
实例:
# prometheus.yml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: ["localhost:9093"] # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "/usr/local/prometheus-2.4.3/rules/test.yml" #要不与promutheus.yml在同一级目录中,要不是绝对路径,相对路径无法读取 scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['127.0.0.1:9090'] labels: instance: localhost - job_name: 'linux' static_configs: - targets: ['127.0.0.1:9100'] labels: instance: node1 - targets: ['172.18.2.28:9090'] labels: instance: node2 - targets: ['172.18.2.28:1234'] labels: instance: node3 # rules/test.yml groups: - name: test rules: - alert: InstanceDown expr: up == 0 for: 1m labels: severity: page annotations: description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes.' summary: 'Instance {{ $labels.instance }} down' link: 'http://172.18.2.27:9090/alerts' color: "#D00000" #发送时的颜色显示,#D00000为红色 username: "刘蓉" #alertmanager.yml global: resolve_timeout: 5m smtp_smarthost: 'smtp.163.com:25' smtp_from: 'lori_liurong@163.com' smtp_auth_username: 'lori_liurong@163.com' smtp_auth_password: 'liurong199686' smtp_require_tls: false route: group_by: ['ip','id','type'] group_wait: 10s group_interval: 10s repeat_interval: 2h #在发送成功的前提下,重复发报警的时间间隔 receiver: 'liurong' receivers: - name: 'liurong' email_configs: - to: 'lori_liurong@163.com' headers: { Subject: "[WARN] 报警邮件test" } slack_configs: - send_resolved: true api_url: https://hooks.slack.com/services/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 #slack中webhook的url,每个channel的webhook的url都不同 channel: "#test-alermanager" text: "{{ range .Alerts }} {{ .Annotations.description}}n {{end}} @{{ .CommonAnnotations.username}} <{{.CommonAnnotations.link}}| click here>" title: "{{.CommonAnnotations.summary}}" title_link: "{{.CommonAnnotations.link}}" color: "{{.CommonAnnotations.color}}"
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192在检测到alertmanager的计算规则时会出现当前有问题的报警,具体解释:http://blog.51cto.com/xujpxm/2055970
日志输出
where can I find prometheus logs?
https://github.com/prometheus/prometheus/issues/2363
启动方式使用脚本方式启动,指定输出日志路径