无法打印RDD的内容 [英] Cannot print the contents of RDD
本文介绍了无法打印RDD的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我尝试打印RDD RDD[(String,List[(String,String)])]
的内容:
I try to print the contents of my RDD RDD[(String,List[(String,String)])]
:
val sc = new SparkContext(conf)
val splitted = rdd.map(line => line.split(","))
val processed = splitted.map(x=>(x(1),List((x(0),x(2),x(3),x(4)))))
val grouped = processed.reduceByKey((x,y) => (x ++ y))
System.out.println(grouped)
但是我没有看到内容:
ShuffledRDD[4] at reduceByKey at Consumer.scala:88
更新:
TXT文件的内容:
100001082016,230,111,1,1
100001082016,121,111,1,1
100001082016,110,111,1,1
UPDATE 2(整个代码):
UPDATE 2 (the whole code):
class Consumer()
{
def run() = {
val conf = new SparkConf()
.setAppName("TEST")
.setMaster("local[*]")
val sc = new SparkContext(conf)
val rdd = sc.textFile("file:///usr/test/myfile.txt")
val splitted = rdd.map(line => line.split(","))
val processed = splitted.map(x=>(x(1),List((x(0),x(2),x(3),x(4)))))
val grouped = processed.reduceByKey((x,y) => (x ++ y))
System.out.println(grouped)
}
}
推荐答案
这里没有问题:
scala> val rdd = sc.parallelize(Seq("100001082016,230,111,1,1","100001082016,121,111,1,1","100001082016,110,111,1,1"))
// rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:27
scala> val splitted = rdd.map(line => line.split(","))
// splitted: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[1] at map at <console>:29
scala> val processed = splitted.map(x=>(x(1),List((x(0),x(2),x(3),x(4)))))
// processed: org.apache.spark.rdd.RDD[(String, List[(String, String, String, String)])] = MapPartitionsRDD[2] at map at <console>:31
scala> val grouped = processed.reduceByKey((x,y) => (x ++ y))
// grouped: org.apache.spark.rdd.RDD[(String, List[(String, String, String, String)])] = ShuffledRDD[3] at reduceByKey at <console>:33
scala> grouped.collect().foreach(println)
// (121,List((100001082016,111,1,1)))
// (110,List((100001082016,111,1,1)))
// (230,List((100001082016,111,1,1)))
以下错误.它可以按预期工作,但是您必须正确理解该语言才能知道所期望的情况:
The following is wrong. It works as expect but you have to understand the language correctly to know what is expect :
scala> System.out.println(grouped)
// ShuffledRDD[3] at reduceByKey at <console>:33
编辑:要清楚一点,如果您希望打印一个集合,则需要使用该集合可用的mkString方法,您需要将其转换为格式才能进行打印你想要的.
Just to be clear, if you wish to print a collection, you'll need to use the mkString method available for the collection that you'd need to print converting it into the format you'd want.
这篇关于无法打印RDD的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文