星火流得到警告"复制到只有0对等体,而不是1同行" [英] Spark Streaming get warn "replicated to only 0 peer(s) instead of 1 peers"

查看:302
本文介绍了星火流得到警告"复制到只有0对等体,而不是1同行"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用火花流来从Twitter获得twitts。
我得到很多的警告,说:

I use spark streaming to receive twitts from twitter. I get many warning that says:

replicated to only 0 peer(s) instead of 1 peers

这是什么警告呢?

what is this warning for?

我的code是:

    SparkConf conf = new SparkConf().setAppName("Test");
    JavaStreamingContext sc = new JavaStreamingContext(conf, Durations.seconds(5));
    sc.checkpoint("/home/arman/Desktop/checkpoint");

    ConfigurationBuilder cb = new ConfigurationBuilder();
    cb.setOAuthConsumerKey("****************")
        .setOAuthConsumerSecret("**************")
        .setOAuthAccessToken("*********************")
        .setOAuthAccessTokenSecret("***************");


    JavaReceiverInputDStream<twitter4j.Status> statuses = TwitterUtils.createStream(sc, 
            AuthorizationFactory.getInstance(cb.build()));

    JavaPairDStream<String, Long> hashtags = statuses.flatMapToPair(new GetHashtags());
    JavaPairDStream<String, Long> hashtagsCount = hashtags.updateStateByKey(new UpdateReduce());
    hashtagsCount.foreachRDD(new saveText(args[0], true));

    sc.start();
    sc.awaitTerminationOrTimeout(Long.parseLong(args[1]));
    sc.stop();


推荐答案

当与读取数据的火花流的,输入的数据块被复制到至少一个另一节点/工人由于容错性。没有这种可能发生的情况下运行时,从流中读取数据,然后将失败的数据的这个特定部分将被丢失(它已经读取并从流擦除和它也失去了,因为失败的工人侧)。

When reading data with Spark Streaming, incoming data blocks are replicated to at least one another node/worker because of fault-tolerance. Without that it may happen that in case the runtime reads data from stream and then fails this particular piece of data would be lost (it's already read and erased from stream and it's also lost at the worker side because of failure).

谈到星火文档

虽然火花流驱动程序运行,该系统接收
  从各种来源和和数据将其分为批次。每批
  数据被视为一个RDD,也就是说,一个不可变的平行
  收集的数据。这些输入RDDS被保存在存储器中和
  复制到两个节点的容错。

While a Spark Streaming driver program is running, the system receives data from various sources and and divides it into batches. Each batch of data is treated as an RDD, that is, an immutable parallel collection of data. These input RDDs are saved in memory and replicated to two nodes for fault-tolerance.

在你的情况警告意味着从流输入数据不被复制的。其原因可能是您用星火工人只有一个实例或在本地模式下运行的应用程序。尝试启动更多的星火工人,看看警告已经一去不复返了。

The warning in your case means that incoming data from stream are not replicated at all. The reason for that may be that you run the app with just one instance of Spark worker or running in local mode. Try to start more Spark workers and see if the warning is gone.

这篇关于星火流得到警告&QUOT;复制到只有0对等体,而不是1同行&QUOT;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆