Hadoop Mapreduce 控制台输出说明 [英] Explanation for Hadoop Mapreduce Console Output

查看:30
本文介绍了Hadoop Mapreduce 控制台输出说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是hadoop环境的新手.我已经设置了 2 节点集群 hadoop.然后我运行示例 mapreduce 应用程序.(实际上是字数).然后我得到这样的输出

I am newbie in hadoop environment. I already set up 2 node cluster hadoop. then I run sample mapreduce application. (wordcount actually). then I got output like this

File System Counters
    FILE: Number of bytes read=492
    FILE: Number of bytes written=6463014
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=71012
    HDFS: Number of bytes written=195
    HDFS: Number of read operations=404
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=2
Job Counters 
    Launched map tasks=80
    Launched reduce tasks=1
    Data-local map tasks=80
    Total time spent by all maps in occupied slots (ms)=429151
    Total time spent by all reduces in occupied slots (ms)=72374
Map-Reduce Framework
    Map input records=80
    Map output records=8
    Map output bytes=470
    Map output materialized bytes=966
    Input split bytes=11040
    Combine input records=0
    Combine output records=0
    Reduce input groups=1
    Reduce shuffle bytes=966
    Reduce input records=8
    Reduce output records=5
    Spilled Records=16
    Shuffled Maps =80
    Failed Shuffles=0
    Merged Map outputs=80
    GC time elapsed (ms)=5033
    CPU time spent (ms)=59310
    Physical memory (bytes) snapshot=18515763200
    Virtual memory (bytes) snapshot=169808543744
    Total committed heap usage (bytes)=14363394048
Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
File Input Format Counters 
    Bytes Read=29603
File Output Format Counters 
    Bytes Written=195

对我得到的每一个数据有什么解释吗?特别是,

Are there any explanation about every data which I got? especially,

  1. 所有地图在占用时隙中花费的总时间(毫秒)
  2. 所有reduce在占用槽中花费的总时间(毫秒)
  3. 花费的 CPU 时间(毫秒)
  4. 物理内存(字节)
  5. 虚拟内存(字节)快照
  6. 提交的堆使用总量(字节)

推荐答案

Mapreduce 框架在作业提交执行时维护计数器.这些计数器显示给用户以了解作业统计信息并查看基准和性能分析.您的工作输出向您展示了一些计数器.权威指南第 8 章中有一个很好的解释,关于计数器,我建议你检查一次.

Mapreduce framwork maintains counters while the job has been submitted for execution. These counters are shown to user for understaing job statistics and to see benchmarks and performance analysis. Your job output has shown you some of the counters. There is a good explanation in definitive guide chapter 8 about the counters, i suggest you to check it once.

为了解释你要求的项目,

To explain about the items you asked for,

1) 所有地图花费的总时间 - 运行地图任务所花费的总时间,以毫秒为单位.包括任务投机启动(投机意味着在等待指定时间后运行失败或缓慢的作业,用悲叹的话来说,投机作业意味着重新运行任何特定的地图任务).

1) Total time spent by all maps - The total time taken running map tasks in milliseconds. Includes tasks that were started speculatively (Speculative means running a failed or slow job after waiting for specified time, in lament terms a speculative job means re-run of any particular map task).

2) 所有 reduce 花费的总时间 - 运行 reduce 任务所花费的总时间,以毫秒为单位.

2) Total time spent by all reduces - The total time taken running reduce tasks in milliseconds.

3) CPU Time - 以毫秒为单位的任务的累积 CPU 时间

3) CPU Time - The cumulative CPU time for a task in milliseconds

4) 物理内存 - 任务使用的物理内存(以字节为单位),这里的内存也计算用于溢出的总内存.

4) Physical memory - The physical memory being used by a task in bytes, memory here counts the total memory used for spills as well.

5) 总虚拟内存 - 任务使用的虚拟内存,以字节为单位

5) Total virtual memory - The virtual memory being used by a task in bytes

6) Total commit heap usage - JVM 中可用的内存总量(以字节为单位)

6) Total committed heap usage - The total amount of memory available in the JVM in bytes

希望这会有所帮助.计数器的类别及其详细信息在权威指南中整齐地给出,如果您需要任何其他信息,请告诉我.

Hope this helps. The categories of counters and their details are neatly given in definitive guide, if you need any additional info, please let me know.

谢谢.

RAM 是处理作业时使用的主要内存.数据将被带到 RAM 并处理作业并将其保存在 RAM 中.但是,数据可能大于分配的 RAM 大小.在这种情况下,操作系统将数据保存在磁盘中并将其与 RAM 交换,以允许更少的 RAM 足以用于内存中较高的文件.例如:RAM 为 64MB,假设文件大小为 128MB,则先将 64MB 保存在 RAM 中,其他 64MB 保存在 DISK 中,然后交换.虽然它不会将其保持为 64MB 和 64MB,但在内部它分为段/页.

RAM is the primary memory that is used when processing a job. The data will be brought to RAM and job gets processed keep it there in RAM. But, data might be bigger that the RAM size allocated. In such scenarios, Operating system keeps the data in Disk and swaps it to and from RAM to allow even lessar RAM is sufficient for files those are higher in memory. for eg: RAM is 64MB, and suppose if the file size is 128 MB, then 64MB will be kept in RAM first and other 64 in DISK, and swaps it. Though it wont keep it as 64MB and 64 MB, internally it divides into segments/pages.

我只是举了一个例子来理解.虚拟内存是通过使用页面并与磁盘和 RAM 交换来处理大于 RAM 的文件的概念.因此,对于上述情况,它实际上使用磁盘中的 64 MB 作为 RAM,因此它被称为虚拟内存.

I just gave an example to understand. A virtual memory is a concept to work for files bigger than RAM by using the pages and swapping with DISK and RAM. So for above case, it virtually using 64 MB from Disk as RAM so it is called as Virtual memory.

希望你能理解.如果您对答案感到满意,请接受它作为答案.如果您有任何问题,请告诉我.

Hope you understand. If you satisfied with the answer, please accept it as answer. Let me know if you have any questions.

堆用于对象存储的JVM内存,在命令行中使用JVM_OPTS设置.通常所有的java程序都需要有这些设置.

Heap the JVM memory used for object store, which is set using JVM_OPTS in command line. Normally all java programs need to have these settings.

这篇关于Hadoop Mapreduce 控制台输出说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆