hadoop +可写接口+ readFields在reducer中引发异常 [英] hadoop + Writable interface + readFields throws an exception in reducer

查看:166
本文介绍了hadoop +可写接口+ readFields在reducer中引发异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的map-reduce程序,其中我的map和reduce基元看起来像这样

map(K,V)=(Text,OutputAggregator) >
reduce(Text,OutputAggregator)=(Text,Text)

重要的一点是,从我的map函数中,我发出一个类型为OutputAggregator的对象,它是我自己的类实现了Writable接口。但是,我的减少失败,出现以下例外。更具体地说,readFieds()函数抛出异常。任何线索为什么?我使用hadoop 0.18.3

  10/09/19 04:04:59 INFO jvm.JvmMetrics:使用processName初始化JVM度量标准= JobTracker,sessionId = 
10/09/19 04:04:59警告mapred.JobClient:使用GenericOptionsParser解析参数。应用程序应该实现相同的工具。
10/09/19 04:04:59信息mapred.FileInputFormat:进程的总输入路径:1
10/09/19 04:04:59信息mapred.FileInputFormat:要处理的总输入路径:1
10/09/19 04:04:59信息mapred.FileInputFormat:要输入的总输入路径:1
10/09/19 04:04:59信息mapred.FileInputFormat:总输入路径处理:1
10/09/19 04:04:59信息mapred.JobClient:正在运行的作业:job_local_0001
10/09/19 04:04:59信息mapred.MapTask:numReduceTasks:1
10/09/19 04:04:59信息mapred.MapTask:io.sort.mb = 100
10/09/19 04:04:59信息mapred.MapTask:data buffer = 79691776/99614720
10/09/19 04:04:59信息mapred.MapTask:record buffer = 262144/327680
长度= 10
10
10/09/19 04:04: 59 INFO mapred.MapTask:开始刷新map输出
10/09/19 04:04:59信息mapred.MapTask:bufstart = 0; bufend = 231; bufvoid = 99614720
10/09/19 04:04:59信息mapred.MapTask:kvstart = 0; kvend = 10;长度= 327680
gl_books
19年10月9日4时04分59秒WARN mapred.LocalJobRunner:job_local_0001
$显示java.lang.NullPointerException在org.myorg.OutputAggregator.readFields B $ B( OutputAggregator.java:46)
在org.apache.hadoop.io.serializer.WritableSerialization $ WritableDeserializer.deserialize(WritableSerialization.java:67)
。在org.apache.hadoop.io.serializer.WritableSerialization $ WritableDeserializer.deserialize(WritableSerialization.java:40)
在org.apache.hadoop.mapred.Task $ ValuesIterator.readNextValue(Task.java:751)
。在org.apache.hadoop.mapred.Task $ ValuesIterator.next(Task.java:691)
at org.apache.hadoop.mapred.Task $ CombineValuesIterator.next(Task.java:770)
org.myorg.xxxParallelizer $ Reduce.reduce( xxxParallelizer.java:117)
在org.myorg.xxxParallelizer $ Reduce.reduce(xxxParallelizer.java:1)
在org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.combineAndSpill(MapTask.java: 904)
在org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.sortA ndSpill(MapTask.java:785)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.flush(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.run( MapTask.java:228)
在org.apache.hadoop.mapred.LocalJobRunner $ Job.run(LocalJobRunner.java:157)
java.io.IOException:作业失败!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
at org.myorg.xxxParallelizer.main(xxxParallelizer.java:145)
at sun.reflect .NativeMethodAccessorImpl.invoke0(本机方法)
在sun.reflect.NativeMethodAccessorImpl.invoke(来源不明)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(来源不明)
在java.lang.reflect中.Method.invoke(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell。 java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)


发布有关自定义代码的问题时:发布相关的代码片段。所以第46行的内容和前几行的内容之后真的会有所帮助...:)

然而,这可能会有所帮助:



当您编写自己的Writable Class时,错误是因为Hadoop一遍又一遍地重用类的实际实例。在调用readFields之间,你不会得到一个闪亮的新实例。



所以在readFields方法开始时,你必须假设你所在的对象被垃圾并且必须在继续之前清除。

我的建议是实施一个clear()方法,完全擦除当前实例并将其重置为它所处的状态在创建完成和构造器完成之后。当然,您也可以将该方法作为您的readField中的键和值的第一件事。



HTH


I have a simple map-reduce program in which my map and reduce primitives look like this

map(K,V) = (Text, OutputAggregator)
reduce(Text, OutputAggregator) = (Text,Text)

The important point is that from my map function I emit an object of type OutputAggregator which is my own class that implements the Writable interface. However, my reduce fails with the following exception. More specifically, the readFieds() function is throwing an exception. Any clue why ? I use hadoop 0.18.3

10/09/19 04:04:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/09/19 04:04:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.JobClient: Running job: job_local_0001
10/09/19 04:04:59 INFO mapred.MapTask: numReduceTasks: 1
10/09/19 04:04:59 INFO mapred.MapTask: io.sort.mb = 100
10/09/19 04:04:59 INFO mapred.MapTask: data buffer = 79691776/99614720
10/09/19 04:04:59 INFO mapred.MapTask: record buffer = 262144/327680
Length = 10
10
10/09/19 04:04:59 INFO mapred.MapTask: Starting flush of map output
10/09/19 04:04:59 INFO mapred.MapTask: bufstart = 0; bufend = 231; bufvoid = 99614720
10/09/19 04:04:59 INFO mapred.MapTask: kvstart = 0; kvend = 10; length = 327680
gl_books
10/09/19 04:04:59 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
 at org.myorg.OutputAggregator.readFields(OutputAggregator.java:46)
 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
 at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:751)
 at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:691)
 at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:770)
 at org.myorg.xxxParallelizer$Reduce.reduce(xxxParallelizer.java:117)
 at org.myorg.xxxParallelizer$Reduce.reduce(xxxParallelizer.java:1)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:904)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:785)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
 at org.myorg.xxxParallelizer.main(xxxParallelizer.java:145)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

解决方案

When posting a question about custom code: Post the relevant piece of code. So the content of line 46 and a few lines before & after would really help ...:)

However this may help:

THE pitfall when writing your own Writable Class is the fact that Hadoop reuses the actual instance of the class over and over again. Between calls to readFields you do NOT get a shiny new instance.

So at the start of the readFields method you MUST assume the object you are in is filled with "garbage" and must be cleared before continuing.

My suggestion to you is to implement a "clear()" method that fully wipes the current instance and resets it to the state it would be in the moment after it was created and the constructor completed. And of course you call that method as the first thing in your readFields for both the key and the value.

HTH

这篇关于hadoop +可写接口+ readFields在reducer中引发异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆