如何获取GKE流畅位相关错误的详细信息? [英] How can I get details of GKE fluentbit related error?
本文介绍了如何获取GKE流畅位相关错误的详细信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我们刚刚发现StackDriver中缺少一些日志,
我们可以使用kubectl logs
列出日志消息,但由于某种原因,其中一些消息没有发送到StackDrive日志。
缺少日志条目的示例:
{"severity":"info","time":"2021-06-07T08:19:17.598Z","caller":"zap/options.go:212","msg":"finished unary call with code OK","grpc.start_time":"2021-06-07T08:19:17Z","system":"grpc","span.kind":"server","grpc.service":"manabie.tom.ChatService","grpc.method":"SendMessage","peer.address":"127.0.0.1:32806","userID":"xxxx","x-request-id":"xxxx","grpc.code":"OK","grpc.time_ms":48.04899978637695}
正在检查fluentbit后台进程:
kubectl logs fluentbit-gke-xxxx -c fluentbit-gke -f --tail=1
我看到一些错误日志,如:
W0607 08:16:55.066861 1 server.go:77] Received empty or invalid msgpack for tag kube_xxxxxxxx
W0607 08:16:59.072151 1 server.go:77] Received empty or invalid msgpack for tag kube_xxxxxxxx
描述后台进程集:
kubectl describe daemonset fluentbit-gke
Name: fluentbit-gke
Selector: component=fluentbit-gke,k8s-app=fluentbit-gke
Node-Selector: kubernetes.io/os=linux
Labels: addonmanager.kubernetes.io/mode=Reconcile
k8s-app=fluentbit-gke
kubernetes.io/cluster-service=true
Annotations: deprecated.daemonset.template.generation: 9
Desired Number of Nodes Scheduled: 4
Current Number of Nodes Scheduled: 4
Number of Nodes Scheduled with Up-to-date Pods: 4
Number of Nodes Scheduled with Available Pods: 4
Number of Nodes Misscheduled: 0
Pods Status: 4 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: component=fluentbit-gke
k8s-app=fluentbit-gke
kubernetes.io/cluster-service=true
Annotations: EnableNodeJournal: false
EnablePodSecurityPolicy: false
SystemOnlyLogging: false
components.gke.io/component-name: fluentbit
components.gke.io/component-version: 1.4.4
monitoring.gke.io/path: /api/v1/metrics/prometheus
Service Account: fluentbit-gke
Containers:
fluentbit:
Image: gke.gcr.io/fluent-bit:v1.5.7-gke.1
Port: 2020/TCP
Host Port: 2020/TCP
Limits:
memory: 250Mi
Requests:
cpu: 50m
memory: 100Mi
Liveness: http-get http://:2020/ delay=120s timeout=1s period=60s #success=1 #failure=3
Environment: <none>
Mounts:
/fluent-bit/etc/ from config-volume (rw)
/var/lib/docker/containers from varlibdockercontainers (ro)
/var/lib/kubelet/pods from varlibkubeletpods (rw)
/var/log from varlog (rw)
/var/run/google-fluentbit/pos-files from varrun (rw)
fluentbit-gke:
Image: gke.gcr.io/fluent-bit-gke-exporter:v0.16.2-gke.0
Port: 2021/TCP
Host Port: 2021/TCP
Command:
/fluent-bit-gke-exporter
--kubernetes-separator=_
--stackdriver-resource-model=k8s
--enable-pod-label-discovery
--pod-label-dot-replacement=_
--split-stdout-stderr
--logtostderr
Limits:
memory: 250Mi
Requests:
cpu: 50m
memory: 100Mi
Liveness: http-get http://:2021/healthz delay=120s timeout=1s period=60s #success=1 #failure=3
Environment: <none>
Mounts: <none>
Volumes:
varrun:
Type: HostPath (bare host directory volume)
Path: /var/run/google-fluentbit/pos-files
HostPathType:
varlog:
Type: HostPath (bare host directory volume)
Path: /var/log
HostPathType:
varlibkubeletpods:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/pods
HostPathType:
varlibdockercontainers:
Type: HostPath (bare host directory volume)
Path: /var/lib/docker/containers
HostPathType:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: fluentbit-gke-config-v1.0.6
Optional: false
Priority Class Name: system-node-critical
Events: <none>
推荐答案
您可能会遇到部分日志超过云日志接口大小限制的情况。
Fluentbit-GKE将其日志存储在每个节点上的/var/log/fluentbit.log中,这些日志不会导出到Cloud Logging。此目录是将/var/log从主机节点的文件系统装载到Pod中的主机路径卷。可以从主机本身访问日志文件。如果需要这些日志,请从节点获取fluentbit日志并提供副本:
$ kubectl get nodes
$ gcloud compute scp <node_name>:/var/log/fluentbit.log* ./
与Fluentd不同,GKE 1.17中的Fluentbit当前单个日志条目的最大大小为32K。这将导致Fluentbit丢弃大小为>;32K的用户日志,并且不会将其导出到Cloud Logging。在GKE 1.18集群上,单个日志条目的大小已增加到1MB。这是将被Fluentbit摄取的大小,但是,fluentbit会将其削减到200kb,以便在将条目写入Cloud Logging之前为添加到条目中的其他元数据留出一些空间。这是因为云日志API在size of log entry上有256 KB的限制。
这篇关于如何获取GKE流畅位相关错误的详细信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文