无法使用Scala在Spark中完成单词计数程序 [英] Not able to complete the word count program in spark using scala

查看:69
本文介绍了无法使用Scala在Spark中完成单词计数程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Scala做一些基本程序

I am doing some basic programs in scala

我正在尝试在Scala中获得单词计数程序

I am trying to get the word count program in scala

scala> val myWords = "HI HOW HI HOW ARE"
myWords: String = HI HOW HI HOW ARE

scala> val mySplit = myWords.split(" ")
mySplit: Array[String] = Array(HI, HOW, HI, HOW, ARE)

scala> val myMap = mySplit.map(x => (x,1))
 myMap: Array[(String, Int)] = Array((HI,1), (HOW,1), (HI,1), (HOW,1), (ARE,1))

 scala> val myCount = myMap.reduceByKey((a,b) => a+b)
 <console>:16: error: value reduceByKey is not a member of Array[(String, Int)]
   val myCount = myMap.reduceByKey((a,b) => a+b)

我不确定这个错误是什么意思?

I am not sure what does this error mean?

所以我试图找到可以用

scala> val myCount = myMap.
apply          asInstanceOf   clone          isInstanceOf   length            toString       update

有人可以向我解释我的代码哪里出错了.

Could someone explains me where I went wrong in my code.

推荐答案

我认为您的代码来自Apache Spark示例.要在普通Scala中进行字数统计,您可以使用 Seq 特性中的 groupBy fold * .

I think that your code comes from an Apache Spark example. To do wordcount in plain Scala, you can use groupBy or fold* from the Seq trait.

修改:从您的评论中可以看出,您确实在使用Spark.然后,您需要做的就是将数组转换为具有 reduceByKey RDD .因此,您可以使用 sc.paralellize Seq 转换为 RDD .然后您的代码将起作用.

I see from your comment that you are indeed using spark. Then what you need to do is to turn your array into an RDD which has reduceByKey. So you use sc.paralellize to turn a Seq to an RDD. Then your code will work.

这篇关于无法使用Scala在Spark中完成单词计数程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆