如何获取GKE流畅位相关错误的详细信息? [英] How can I get details of GKE fluentbit related error?

查看:6
本文介绍了如何获取GKE流畅位相关错误的详细信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们刚刚发现StackDriver中缺少一些日志, 我们可以使用kubectl logs列出日志消息,但由于某种原因,其中一些消息没有发送到StackDrive日志。
缺少日志条目的示例:

{"severity":"info","time":"2021-06-07T08:19:17.598Z","caller":"zap/options.go:212","msg":"finished unary call with code OK","grpc.start_time":"2021-06-07T08:19:17Z","system":"grpc","span.kind":"server","grpc.service":"manabie.tom.ChatService","grpc.method":"SendMessage","peer.address":"127.0.0.1:32806","userID":"xxxx","x-request-id":"xxxx","grpc.code":"OK","grpc.time_ms":48.04899978637695}

正在检查fluentbit后台进程:

kubectl logs fluentbit-gke-xxxx -c fluentbit-gke -f --tail=1 

我看到一些错误日志,如:

W0607 08:16:55.066861       1 server.go:77] Received empty or invalid msgpack for tag kube_xxxxxxxx
W0607 08:16:59.072151       1 server.go:77] Received empty or invalid msgpack for tag kube_xxxxxxxx

描述后台进程集:

kubectl describe daemonset fluentbit-gke
Name:           fluentbit-gke
Selector:       component=fluentbit-gke,k8s-app=fluentbit-gke
Node-Selector:  kubernetes.io/os=linux
Labels:         addonmanager.kubernetes.io/mode=Reconcile
                k8s-app=fluentbit-gke
                kubernetes.io/cluster-service=true
Annotations:    deprecated.daemonset.template.generation: 9
Desired Number of Nodes Scheduled: 4
Current Number of Nodes Scheduled: 4
Number of Nodes Scheduled with Up-to-date Pods: 4
Number of Nodes Scheduled with Available Pods: 4
Number of Nodes Misscheduled: 0
Pods Status:  4 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           component=fluentbit-gke
                    k8s-app=fluentbit-gke
                    kubernetes.io/cluster-service=true
  Annotations:      EnableNodeJournal: false
                    EnablePodSecurityPolicy: false
                    SystemOnlyLogging: false
                    components.gke.io/component-name: fluentbit
                    components.gke.io/component-version: 1.4.4
                    monitoring.gke.io/path: /api/v1/metrics/prometheus
  Service Account:  fluentbit-gke
  Containers:
   fluentbit:
    Image:      gke.gcr.io/fluent-bit:v1.5.7-gke.1
    Port:       2020/TCP
    Host Port:  2020/TCP
    Limits:
      memory:  250Mi
    Requests:
      cpu:        50m
      memory:     100Mi
    Liveness:     http-get http://:2020/ delay=120s timeout=1s period=60s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /fluent-bit/etc/ from config-volume (rw)
      /var/lib/docker/containers from varlibdockercontainers (ro)
      /var/lib/kubelet/pods from varlibkubeletpods (rw)
      /var/log from varlog (rw)
      /var/run/google-fluentbit/pos-files from varrun (rw)
   fluentbit-gke:
    Image:      gke.gcr.io/fluent-bit-gke-exporter:v0.16.2-gke.0
    Port:       2021/TCP
    Host Port:  2021/TCP
    Command:
      /fluent-bit-gke-exporter
      --kubernetes-separator=_
      --stackdriver-resource-model=k8s
      --enable-pod-label-discovery
      --pod-label-dot-replacement=_
      --split-stdout-stderr
      --logtostderr
    Limits:
      memory:  250Mi
    Requests:
      cpu:        50m
      memory:     100Mi
    Liveness:     http-get http://:2021/healthz delay=120s timeout=1s period=60s #success=1 #failure=3
    Environment:  <none>
    Mounts:       <none>
  Volumes:
   varrun:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/google-fluentbit/pos-files
    HostPathType:  
   varlog:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:  
   varlibkubeletpods:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods
    HostPathType:  
   varlibdockercontainers:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker/containers
    HostPathType:  
   config-volume:
    Type:               ConfigMap (a volume populated by a ConfigMap)
    Name:               fluentbit-gke-config-v1.0.6
    Optional:           false
  Priority Class Name:  system-node-critical
Events:                 <none>

推荐答案

您可能会遇到部分日志超过云日志接口大小限制的情况。

Fluentbit-GKE将其日志存储在每个节点上的/var/log/fluentbit.log中,这些日志不会导出到Cloud Logging。此目录是将/var/log从主机节点的文件系统装载到Pod中的主机路径卷。可以从主机本身访问日志文件。如果需要这些日志,请从节点获取fluentbit日志并提供副本:

$ kubectl get nodes
$ gcloud compute scp <node_name>:/var/log/fluentbit.log* ./
与Fluentd不同,GKE 1.17中的Fluentbit当前单个日志条目的最大大小为32K。这将导致Fluentbit丢弃大小为>;32K的用户日志,并且不会将其导出到Cloud Logging。在GKE 1.18集群上,单个日志条目的大小已增加到1MB。这是将被Fluentbit摄取的大小,但是,fluentbit会将其削减到200kb,以便在将条目写入Cloud Logging之前为添加到条目中的其他元数据留出一些空间。这是因为云日志API在size of log entry上有256 KB的限制。

这篇关于如何获取GKE流畅位相关错误的详细信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆