增加语言过滤器叽叽喳喳popularhashtags - 斯卡拉 [英] adding language filter to twitter popularhashtags - scala
问题描述
我是新来的星火和Scala。我跑了星火流作业Twitter的流行哈希标签。
I am new to Spark and Scala. I ran the Spark streaming job-twitter popular hash tags.
我增加了一些词的过滤器,并能够过滤掉鸣叫:
I added a filter for some words and was able to filter out tweets :
val filter = Array("spark", "Big Data")
val stream = TwitterUtils.createStream(ssc, None, filter)
同样我想添加一个语言过滤器,这样只有英文推特流。 Twitter4j有轨道()
和位置
。它有一个语言过滤器?如果是这样,它是如何在斯卡拉工作?
Likewise I want to add a language filter so that only English tweets are streamed. Twitter4j has Track()
and Locations
. Does it have a language filter? If so, how does it work in Scala?
推荐答案
我重复什么已经在说<一href=\"http://apache-spark-user-list.1001560.n3.nabble.com/filtering-out-non-English-tweets-using-TwitterUtils-td18614.html\"相对=nofollow>这个星火线程。
星火使用Twitter4J的饲料。 Twitter4J为3.0.6版本的 getLang
( DOC ),它允许您:
Spark uses Twitter4J for the feed. Twitter4J as of version 3.0.6 has getLang
(doc) which allows you to:
.filter(_.getLang == "en")
可以对使用的 DSTREAM
twitter4j.Status
的
但不幸的是星火使用旧版版本( DOC )不具有 getLang
。
But unfortunately Spark uses an older version of Twitter4J (doc) which doesn't have getLang
.
无论是星火内升级到Twitter4J 3.0.6,等待星火升级Twitter4J,还是一个完全不同的方法。
Either upgrade Twitter4J within Spark to 3.0.6, wait for Spark to upgrade Twitter4J, or an altogether different approach.
这篇关于增加语言过滤器叽叽喳喳popularhashtags - 斯卡拉的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!