Prometheus未从GKE中的cadvisor接收指标 [英] Prometheus not receiving metrics from cadvisor in GKE

查看:182
本文介绍了Prometheus未从GKE中的cadvisor接收指标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Heyo,

我已经在GKE v1.16.x中使用kubernetes部署了prometheus,grafana,kube-state-metrics,alertmanager等设置.我使用了 https://github.com/do-community/doks-monitoring作为Yaml文件的起点.

I've deployed a prometheus, grafana, kube-state-metrics, alertmanager, etc. setup using kubernetes in GKE v1.16.x. I've used https://github.com/do-community/doks-monitoring as a jumping off point for the yaml files.

我已经尝试调试几天了,非常感谢您的帮助.我的Prometheus节点没有从cadvisor获取指标.

I've been trying to debug a situation for a few days now and would be very grateful for some help. My prometheus nodes are not getting metrics from cadvisor.

  • 部署中的所有服务和Pod正在运行. prometheus,kube状态指标,节点导出程序,所有正在运行-没有错误.
  • 普罗米修斯用户界面中的cadvisor目标显示为"up".
  • Prometheus能够从群集中收集其他指标,但没有容器/容器级别的使用指标.
  • 查询kubectl get --raw "/api/v1/nodes/<your_node>/proxy/metrics/cadvisor"时可以看到cadvisor指标,但是当我在中查找container_cpu_usagecontainer_memory_usage时,没有数据.
  • 我的cadvisor在Prometheus中抓取作业配置
  • All the services and pods in the deployments are running. prometheus, kube-state-metrics, node-exporter, all running - no errors.
  • The cadvisor targets in prometheus UI appear as "up".
  • Prometheus is able to collect other metrics from the cluster, but no pod/container level usage metrics.
  • I can see cadvisor metrics when I query kubectl get --raw "/api/v1/nodes/<your_node>/proxy/metrics/cadvisor", but when I look in prometheus for container_cpu_usage or container_memory_usage, there is no data.
  • My cadvisor scrape job config in prometheus
    - job_name: kubernetes-cadvisor
      honor_timestamps: true
      scrape_interval: 15s
      scrape_timeout: 10s
      metrics_path: /metrics/cadvisor
      scheme: https
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)

抄袭了prometheus/docs/examples.

cribbed from the prometheus/docs/examples.

我在路径和刮擦配置上尝试了很多不同的变体,但是没有运气.基于我可以使用kubectl get(它们存在)查询指标的事实,在我看来,这个问题是普罗米修斯与cadvisor目标进行通信.

I've tried a whole bunch of different variations on paths and scrape configs, but no luck. Based on the fact that I can query the metrics using kubectl get (they exist) it seems to me the issue is prometheus communicating with the cadvisor target.

如果有人有配置此配置的经验,我将非常感谢您提供的调试帮助.

If anyone has experience getting this configured I'd sure appreciate some help debugging.

欢呼

推荐答案

太令人沮丧了, 过去的几天我一直在挖掘.

Too Frustrating, I've been digging for past few days.

从gke主服务器从1.15.12-gke.2升级到1.16.13-gke.401之后,问题就开始了.

The issue started since after the gke master upgraded from 1.15.12-gke.2 to 1.16.13-gke.401.

要确认这一点,请在另一个gke集群中执行相同的操作,结果是相同的.

To confirm this, did the same in another gke cluster, and result is same.

以上配置禁止使用403.

and above configuration is giving 403 forbidden.

在此处输入图片描述

这篇关于Prometheus未从GKE中的cadvisor接收指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆