Flume - TwitterSource语言过滤器 [英] Flume - TwitterSource language filter

查看:264
本文介绍了Flume - TwitterSource语言过滤器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在以下情况下请求您的帮助。



我目前使用Cloudera CDH 5.1.2,并试图使用Flume收集Twitter数据它在下面的porsts(Cloudera)中有描述:



我在更新pom.xml中的版本后下载了源代码并重建了flume-sources:

 < flume.version> 1.5.0-cdh5.1.2< /flume.version> 
< hadoop.version> 2.3.0-cdh5.1.2< /hadoop.version>

完美运作。

我想添加一个语言过滤器,只捕获特定语言的推文。为此,我修改了TwitterSource.java以调用FilterQuery.language方法,如下所示:
$ b


FilterQuery query = new FilterQuery();

...

if(languages.length!= 0){
b $ b query.language(languages);



$ b

我试图使用twitter4j-stream版本3.0.6。我在pom.xml中更新了它:

 <! - 对于Twitter API  - > 
< dependency>
< groupId> org.twitter4j< / groupId>
< artifactId> twitter4j-stream< / artifactId>
< version> 3.0.6< / version>
< /依赖关系>

通过这些设置,我重建了jar(mvn package)。



当我启动我的代理时,出现以下异常(NoSuchMethodError):

lockquote
<无法启动EventDrivenSourceRunner:{source:com.cloudera.flume.source.TwitterSource {name:Twitter,state:IDLE}} - 异常如下。
java.lang.NoSuchMethodError:twitter4j.FilterQuery.language([Ljava / lang / String;)Ltwitter4j / FilterQuery;
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:165)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
在org.apache.flume.lifecycle.LifecycleSupervisor $ MonitorRunnable.run(LifecycleSupervisor.java:251)
在java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:471)
在java .util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
在java.util.concurrent.ScheduledThreadPoolExecutor中$ ScheduledFutureTask.access $ 301(ScheduledThreadPoolExecutor.java:178)
。在java.util.concurrent中。的ScheduledThreadPoolExecutor $ ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
在java.util.concurrent.ThreadPoolExecutor中的$ Worker.run( ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


我检查过,以及twitter4j-stre的这个版本am包含语言方法:


  • github.com/yusuke/twitter4j/blob/3.0.6/twitter4j-stream/src/main/ java / twitter4j / FilterQuery.java



我做错了什么?

在此先感谢,



Peter

解决方案

解决这个问题。所以这里的解决方案给那些面临同样问题的人。



首先(在上面的例子中,原始帖子中)我将生成的jar放到 /var/lib/flume-ng/plugins.d/twitter-streaming/lib/ ,然后在Cloudera Manager配置中使用此位置。



在这种情况下,CM将此目录放置在runner文件中的类路径中(在parcel目录之后)。因此,类路径中的目录顺序如下所示:




  • / opt / cloudera / parcels / CDH -5.1.2-1.cdh5.1.2.p0.3 / lib / flume-ng / lib / *


  • /var/lib/flume-ng/plugins.d/twitter-streaming/lib / *




不幸的是,parcel目录中有一个twitter4j-stream-3.0.3.jar和twitter4j-core-3.0.3.jar,而flume试图使用它来代替3.0.6,并且在该版本中 FilterQuery.language 显然不存在。



所以我刚从地块目录中删除了这些罐子,现在它工作正常。


I would like to ask your help in the following case.

I'm currently using Cloudera CDH 5.1.2 and I tried to collect Twitter data using Flume as it is described in the following porsts (Cloudera):

I downloaded the source and rebuilt the flume-sources after updating the versions in pom.xml:

<flume.version>1.5.0-cdh5.1.2</flume.version>
<hadoop.version>2.3.0-cdh5.1.2</hadoop.version>

It worked perfectly.

After that I wanted to add a "language" filter, to capture only the tweets of a specific language. For this, I modified the TwitterSource.java to call the FilterQuery.language method somehow like this:

FilterQuery query = new FilterQuery();
...
if (languages.length != 0) {
query.language(languages);
}

I'm trying to use twitter4j-stream version 3.0.6. I updated it in pom.xml:

<!-- For the Twitter API -->
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-stream</artifactId>
<version>3.0.6</version>
</dependency>

With these settings I rebuilt the jar (mvn package).

When I start my agent, I get the following exception (NoSuchMethodError):

Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows. java.lang.NoSuchMethodError: twitter4j.FilterQuery.language([Ljava/lang/String;)Ltwitter4j/FilterQuery; at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:165) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

I checked, and this version of twitter4j-stream contains the language method:

  • github.com/yusuke/twitter4j/blob/3.0.6/twitter4j-stream/src/main/java/twitter4j/FilterQuery.java

What am I doing wrong?

Thanks in advance,

Peter

解决方案

Finally I managed to solve this problem. So here's the solution to anyone out there facing the same issue.

First (in the above case in the original post) I placed my generated jar to /var/lib/flume-ng/plugins.d/twitter-streaming/lib/, and set it up in the Cloudera Manager config to use this location.

In this case the CM placed this directory to the and of the classpath in the runner file (after the parcel directory). So the directory order in the classpath looked like this:

  • /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/flume-ng/lib/*

  • /var/lib/flume-ng/plugins.d/twitter-streaming/lib/*

Unfortunately there was a twitter4j-stream-3.0.3.jar and twitter4j-core-3.0.3.jar in the parcel directory, and flume tried to use that instead of 3.0.6, and in that version FilterQuery.language obviously doesn't exist.

So I just deleted those jars from the parcel directory, and it works fine now.

这篇关于Flume - TwitterSource语言过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆