异常而连接在火花的MongoDB [英] Exception while connecting to mongodb in spark
问题描述
我得到java.lang.IllegalStateException:没有准备好,在org.bson.BasicBSONDe coder._de code,而尝试使用MongoDB的输入RDD:
I get "java.lang.IllegalStateException: not ready" in org.bson.BasicBSONDecoder._decode while trying to use MongoDB as input RDD:
Configuration conf = new Configuration();
conf.set("mongo.input.uri", "mongodb://127.0.0.1:27017/test.input");
JavaPairRDD<Object, BSONObject> rdd = sc.newAPIHadoopRDD(conf, MongoInputFormat.class, Object.class, BSONObject.class);
System.out.println(rdd.count());
我得到的例外是:
14/08/06 9点49分57秒INFO rdd.NewHadoopRDD:输入分裂:
The exception I get is: 14/08/06 09:49:57 INFO rdd.NewHadoopRDD: Input split:
MongoInputSplit{URI=mongodb://127.0.0.1:27017/test.input, authURI=null, min={ "_id" : { "$oid" : "53df98d7e4b0a67992b31f8d"}}, max={ "_id" : { "$oid" : "53df98d7e4b0a67992b331b8"}}, query={ }, sort={ }, fields={ }, notimeout=false} 14/08/06 09:49:57
WARN scheduler.TaskSetManager: Loss was due to java.lang.IllegalStateException
java.lang.IllegalStateException: not ready
at org.bson.BasicBSONDecoder._decode(BasicBSONDecoder.java:139)
at org.bson.BasicBSONDecoder.decode(BasicBSONDecoder.java:123)
at com.mongodb.hadoop.input.MongoInputSplit.readFields(MongoInputSplit.java:185)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:618)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1089)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1962)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1867)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1419)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2059)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1984)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1867)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1419)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:420)
at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:147)
at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1906)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1865)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1419)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:420)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
at java.lang.Thread.run(Thread.java:804)
所有程序的输出是这里
环境:
- 红帽
- 星火1.0.1
- 的Hadoop 2.4.1
- MongoDB的2.4.10
- 蒙戈-Hadoop的1.3
推荐答案
我想我已经找到了问题:MongoDB的-的Hadoop在其BSON EN codeR /德codeR实例的静态修饰符核心/ src目录/主/ JAVA / COM / MongoDB的/ Hadoop的/输入/ MongoInputSplit.java。当Spark在多线程模式下运行的所有线程尝试使用deserialise在相同 EN codeR /德codeR实例,predicatbly有坏的结果。
I think I've found the issue: mongodb-hadoop has a "static" modifier on its BSON encoder/decoder instances in core/src/main/java/com/mongodb/hadoop/input/MongoInputSplit.java. When Spark runs in multithreaded mode all the threads try and deserialise using the same encoder/decoder instances, which predicatbly has bad results.
补丁在我github上这里
(已提交pull请求上游)
Patch on my github here (have submitted a pull request upstream)
我现在可以运行一个8核多线程的火花>蒙戈集合计数()从Python的!
I'm now able to run an 8 core multithreaded Spark->mongo collection count() from Python!
这篇关于异常而连接在火花的MongoDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!