如何获取Spark应用程序的HDFS字节读写? [英] How do I get HDFS bytes read and write for Spark applications?

查看:56
本文介绍了如何获取Spark应用程序的HDFS字节读写?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为spark应用程序收集不同的指标,如果有人对如何获取HDFS字节读写有任何想法,请告诉我?

I want to collect different metrics for the spark application, if some one have any idea about how do i get HDFS bytes read and write please tell me?

推荐答案

我正在寻找相同的信息,但在任何地方都找不到该信息:Spark文档或Spark用户邮件列表(即使有人提出问题)在给我信息.

I'm looking for the same information and I can't find the information anywhere: neither Spark documentation or the Spark users mailing list (even if some poeple are asking the question) is giving me the information.

但是,我发现一些 是Internet上的>提示.

我正在处理一些应用程序日志(由历史服务器提供的日志),似乎在其中出现的 Input Metrics Output Metrics 每个 SparkListenerTaskEnd 事件中的 Task Metrics 提供每个任务读写的数据量.

I'm working on some application logs (the ones that are provided by the history server) and it seems that the Input Metrics and Output Metrics that are presents in Task Metrics in each SparkListenerTaskEnd events is giving the amount of data that is read and written for each tasks.

{
  "Event": "SparkListenerTaskEnd",
  ...
  "Task Metrics": {
      ...
      "Input Metrics": {
        "Bytes Read": 268566528,
        "Records Read": 2796202
      },
      "Output Metrics": {
        "Bytes Written": 0,
        "Records Written": 0
      },
      ...
  },
  ...
}

请注意,我对此不是100%肯定,但是我得到的日志似乎与此假设一致:)

Note that I'm not a 100% sure on that but the log I got seems to be consistent with this assumption :)

此外,如果您正在从本地文件系统中读取数据,我认为这将以相同的度量标准进行混合.

Also, If you are reading from local filesystem I think this will be mixed in the same metric.

这篇关于如何获取Spark应用程序的HDFS字节读写?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆