"java.lang.UnsupportedOperationException:空集合" [英] "java.lang.UnsupportedOperationException: empty collection"

查看:46
本文介绍了"java.lang.UnsupportedOperationException:空集合"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Spark 2.1.1和Scala 2.11.8

I'm working with Spark 2.1.1 and Scala 2.11.8

我正在Spark-shell中执行我的代码.这是我正在执行的代码

I'm executing my code in Spark-shell. This is the code I'm executing

val read_file1 = sc.textFile("Path to file 1");

val uid = read_file1.map(line => line.split(",")).map(array => array.map(arr => {
 | if(arr.contains(":")) (array(2).split(":")(0), arr.split(":")(0))
 |  else (array(2).split(":")(0), arr)}))

val rdd1 = uid.map(array => array.drop(4)).flatMap(array => array.toSeq).map(y=>(y,1)).reduceByKey(_+_)

我的这段代码输出是:

(( v67430612_serv78i, fb_201906266952256),1)
(( v74005958_serv35i, fb_128431994336303),1)

但是,对于两个RDD的输出,当我执行时:

However for the two RDDs' outputs, when I execute :

uid2.map(x => ((x._1, x._2), x._3)).join(rdd1).map(y => ((y._1._1, y._1._2, y._2._1), y._2._2))

我收到错误消息:

 "java.lang.UnsupportedOperationException: empty collection" 

为什么会出现此错误?

以下是输入文件的示例:-

Here are samples of the input files:-

文件1:

2017-05-09 21:52:42 , 1494391962 , p69465323_serv80i:10:450 , 7 , fb_406423006398063:396560, guest_861067032060185_android:671051, fb_100000829486587:186589, fb_100007900293502:407374, fb_172395756592775:649795
2017-05-09 21:52:42 , 1494391962 , z67265107_serv77i:4:45 , 2:Re , fb_106996523208498:110066, fb_274049626104849:86632, fb_111857069377742:69348, fb_127277511127344:46246

文件2:

fb_100008724660685,302502,-450,v300430479_serv73i:10:450,switchtable,2017-04-30 00:00:00    
fb_190306964768414,147785,-6580,r308423810_serv31i::20,invite,2017-04-30 00:00:00

我刚刚指出了这一点:执行时

I just noted this : When I'm executing

rdd1.take(10).foreach(println) or rdd1.first()

我在输出之前也收到了此消息:

I get this message too before the output :

WARN Executor: Managed memory leak detected; size = 39979424 bytes, TID = 11

我不知道这是否与问题有关?

I don't know if this might have anything to do with the problem??

另一个说明:仅当我这样做时才会发生此错误

Another note : this error only occurs when I do

res.first()

对于

uid2.map(x => ((x._1, x._2), x._3)).join(rdd1).map(y => ((y._1._1, y._1._2, y._2._1), y._2._2))

在做

res.take(10).foreach(println)

我没有任何输出,但是也没有返回错误.

I don't get any output but no error is returned either.

推荐答案

您忘记了对由分隔线创建的元组中的空格进行 trim 修剪,因此没有任何连接不匹配.因此,当您尝试从空的 rdd 中获取 take 时,会引发异常.

You forgot to trim the spaces in the tuples created from splitted line so nothing was joined as they didn't match. So when you tried take from an empty rdd, exception was thrown.

您可以使用以下解决方案.它在我的工作中.

You can use following solution. Its working in mine.

val read_file1 = sc.textFile("Path to file 1");

val uid = read_file1.map(line => line.split(",")).map(array => array.map(arr => {
   if(arr.contains(":")) (array(2).split(":")(0).trim, arr.split(":")(0).trim)
    else (array(2).split(":")(0).trim, arr.trim)}))

val rdd1 = uid.map(array => array.drop(4)).flatMap(array => array.toSeq).map(y=>(y,1)).reduceByKey(_+_)


val read_file2 = sc.textFile("Path to File 2");
val uid2 = read_file2.map(line => {var arr = line.split(","); (arr(3).split(":")(0).trim,arr(0).trim,arr(2).trim)});

val res = uid2.map(x => ((x._1, x._2), x._3)).join(rdd1).map(y => ((y._1._1, y._1._2, y._2._1), y._2._2))
res.take(10).foreach(println)

这篇关于"java.lang.UnsupportedOperationException:空集合"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆