使用 Apache Flink 连接到 Twitter Streaming API 时的 IOExcpetion [英] IOExcpetion while connecting to Twitter Streaming API with Apache Flink

查看:33
本文介绍了使用 Apache Flink 连接到 Twitter Streaming API 时的 IOExcpetion的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个使用 Apache Flink Streaming API 读取 Twitter 推文的小型 Scala 程序.

I wrote a small Scala program which uses the Apache Flink Streaming API to read Twitter tweets.

object TwitterWordCount {
  private val properties = "/home/twitter-login.properties"
  def main(args: Array[String]) {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val twitterStream = env.addSource(new TwitterSource(properties))
    val tweets = twitterStream
      .flatMap(new JSONParseFlatMap[String, String] {
        override def flatMap(in: String, out: Collector[String]): Unit = {
          if (getString(in, "user.lang") == "en") {
            out.collect(getString(in, "text"))
          }
        }
      })
    tweets.print
    env.execute("tweets")
  }
}

执行时遇到以下问题:

14:35:48,353 INFO  com.twitter.hbc.httpclient.ClientBase - twitterSourceClient Establishing a connection
14:35:48,354 DEBUG org.apache.http.impl.conn.PoolingClientConnectionManager - Connection request: [route: {}->http://stream.twitter.com][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]
14:35:48,354 DEBUG org.apache.http.impl.conn.PoolingClientConnectionManager - Connection leased: [id: 4][route: {}->http://stream.twitter.com][total kept alive: 0; route allocated: 1 of 2; total allocated: 1 of 20]
14:35:48,354 DEBUG org.apache.http.impl.conn.DefaultClientConnectionOperator - Connecting to stream.twitter.com:80
14:35:49,486 DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Received message SendHeartbeat at akka://flink/user/taskmanager_1 from Actor[akka://flink/deadLetters].
14:35:49,486 DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
14:35:49,487 DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Handled message SendHeartbeat in 1 ms from Actor[akka://flink/deadLetters].
14:35:49,487 DEBUG org.apache.flink.runtime.jobmanager.JobManager - Received message Heartbeat(cb51cdb1bd08879df10bd2198b8e043a,[B@4daaaf5f) at akka://flink/user/jobmanager from Actor[akka://flink/user/taskmanager_1#-64418449].
14:35:49,488 DEBUG org.apache.flink.runtime.jobmanager.JobManager - Received hearbeat message from cb51cdb1bd08879df10bd2198b8e043a.
14:35:49,488 DEBUG org.apache.flink.runtime.instance.InstanceManager - Received heartbeat from TaskManager cb51cdb1bd08879df10bd2198b8e043a @ localhost - 8 slots - URL: akka://flink/user/taskmanager_1
14:35:49,488 DEBUG org.apache.flink.runtime.jobmanager.JobManager - Handled message Heartbeat(cb51cdb1bd08879df10bd2198b8e043a,[B@4daaaf5f) in 0 ms from Actor[akka://flink/user/taskmanager_1#-64418449].
14:35:52,358 DEBUG org.apache.http.impl.conn.DefaultClientConnection - Connection org.apache.http.impl.conn.DefaultClientConnection@64c88f2d closed
14:35:52,358 DEBUG org.apache.http.impl.conn.DefaultClientConnection - Connection org.apache.http.impl.conn.DefaultClientConnection@64c88f2d shut down
14:35:52,358 DEBUG org.apache.http.impl.conn.PoolingClientConnectionManager - Connection [id: 4][route: {}->http://stream.twitter.com] can be kept alive for 9223372036854775807 MILLISECONDS
14:35:52,358 DEBUG org.apache.http.impl.conn.DefaultClientConnection - Connection org.apache.http.impl.conn.DefaultClientConnection@64c88f2d closed
14:35:52,358 DEBUG org.apache.http.impl.conn.PoolingClientConnectionManager - Connection released: [id: 4][route: {}->http://stream.twitter.com][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]
14:35:52,359 WARN  com.twitter.hbc.httpclient.ClientBase - twitterSourceClient IOException caught when establishing connection to https://stream.twitter.com/1.1/statuses/filter.json?delimited=length
14:35:53,613 WARN  com.twitter.hbc.httpclient.ClientBase - twitterSourceClient failed to establish connection properly
14:35:53,613 INFO  com.twitter.hbc.httpclient.ClientBase - twitterSourceClient Done processing, preparing to close connection
14:35:53,613 DEBUG org.apache.http.impl.conn.PoolingClientConnectionManager - Connection manager is shutting down
14:35:53,613 DEBUG org.apache.http.impl.conn.PoolingClientConnectionManager - Connection manager shut down

程序尝试重新建立连接.所以这 4 行日志消息继续发出.

The program tries to re-establish the connection. So this 4 lines of log message continue being emitted.

奇怪的是,当我运行 example 一切正常(我拉了最新版本的 master来自 GitHub).我什至使用相同的属性文件.如果我将该示例类复制到我自己的项目中,也会出现上述问题.

The strange thing about this is, when I run the example provided in the Apache Flink project everything works just fine (I pulled the latest version of master from GitHub). I even use the same properties file. If I copy that example class to my own project the problem state above occurs too.

我使用 Flink 原型来创建我自己的项目.我尝试了 0.9.1 和 0.10-SNAPSHOT 版本.依赖 flink-scalaflink-streaming-scalaflink-clientsflink-connector-twitter 是在相应版本中使用.

I used the Flink archetype to create my own project. I tried in version 0.9.1 as well as 0.10-SNAPSHOT. The dependencies flink-scala, flink-streaming-scala, flink-clients and flink-connector-twitter are used in the corresponding version.

有没有人遇到过类似的问题并且能让我走上正轨?

Does anyone have experienced a similar issue and can get me on the right track?

推荐答案

调试 com.twitter.hbc.httpclient.ClientBase 给我带来了以下异常:org.apache.http.conn.ConnectTimeoutException: 连接到 stream.twitter.com:80 超时

Debugging the com.twitter.hbc.httpclient.ClientBase brought me to the following Exception: org.apache.http.conn.ConnectTimeoutException: Connect to stream.twitter.com:80 timed out

根据 post 在 Twitter 开发者论坛上发生这种情况是因为 Apaches HttpClient 4.2 中的一个错误.事实上,解决我的项目上的依赖树表明 flink-runtime 依赖于 com.amazonaws:aws-java-sdk:1.81,而后者又依赖于 org.apache.httpcomponents:httpclient:4.2.

According to a post on the Twitter Developer forum this happens because of a bug in Apaches HttpClient 4.2. And in fact, resolving the dependency tree on my project shows that the flink-runtime has a dependency on com.amazonaws:aws-java-sdk:1.81 which again has a dependency on org.apache.httpcomponents:httpclient:4.2.

将HttpClient 4.2.6添加到我的项目的依赖中暂时解决了这个问题.

Adding HttpClient 4.2.6 to the dependencies of my project solved the problem temporarily.

这篇关于使用 Apache Flink 连接到 Twitter Streaming API 时的 IOExcpetion的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆