无法使用Scala在Spark中完成单词计数程序 [英] Not able to complete the word count program in spark using scala
问题描述
我正在使用Scala做一些基本程序
I am doing some basic programs in scala
我正在尝试在Scala中获得单词计数程序
I am trying to get the word count program in scala
scala> val myWords = "HI HOW HI HOW ARE"
myWords: String = HI HOW HI HOW ARE
scala> val mySplit = myWords.split(" ")
mySplit: Array[String] = Array(HI, HOW, HI, HOW, ARE)
scala> val myMap = mySplit.map(x => (x,1))
myMap: Array[(String, Int)] = Array((HI,1), (HOW,1), (HI,1), (HOW,1), (ARE,1))
scala> val myCount = myMap.reduceByKey((a,b) => a+b)
<console>:16: error: value reduceByKey is not a member of Array[(String, Int)]
val myCount = myMap.reduceByKey((a,b) => a+b)
我不确定这个错误是什么意思?
I am not sure what does this error mean?
所以我试图找到可以用
scala> val myCount = myMap.
apply asInstanceOf clone isInstanceOf length toString update
有人可以向我解释我的代码哪里出错了.
Could someone explains me where I went wrong in my code.
推荐答案
我认为您的代码来自Apache Spark示例.要在普通Scala中进行字数统计,您可以使用 Seq
特性中的 groupBy
或 fold *
.
I think that your code comes from an Apache Spark example. To do wordcount in plain Scala, you can use groupBy
or fold*
from the Seq
trait.
修改:从您的评论中可以看出,您确实在使用Spark.然后,您需要做的就是将数组转换为具有 reduceByKey
的 RDD
.因此,您可以使用 sc.paralellize
将 Seq
转换为 RDD
.然后您的代码将起作用.
I see from your comment that you are indeed using spark. Then what you need to do is to turn your array into an RDD
which has reduceByKey
. So you use sc.paralellize
to turn a Seq
to an RDD
. Then your code will work.
这篇关于无法使用Scala在Spark中完成单词计数程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!