星火流:Rdd.Count()没有返回一个有效的数字 [英] Spark Streaming: Rdd.Count() not returning a valid number

查看:1451
本文介绍了星火流:Rdd.Count()没有返回一个有效的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用我有一个包含一些数据的两个JavaDStreams。我试图计数在每个JavaDStream行数不过我在日志中接收的结果不是数字,而是一种完全不同的对象,其输出到日志中。我在做什么错在这里?

code:

  //地图评分结果集的鸣叫
    JavaDStream< Tuple5<长,弦乐,浮动,浮动,字符串>>结果=
            scoredTweets.map(新ScoreTweetsFunction());    //获取额外的元素
    JavaDStream< Tuple7<长,字符串,字符串,字符串,字符串,字符串,字符串>> extra_elements =
            json.map(新GetExtraElements());     //联同比分结果元素
    的System.out.println(额外的元素RDD行数:+ extra_elements.count());
    的System.out.println(在结果RDD行数:+ result.count());

从日志输出:

 额外的元素RDD排数:org.apache.spark.streaming.api.java.JavaDStream@73358a55
在结果RDD行数:org.apache.spark.streaming.api.java.JavaDStream@242aa3b2


解决方案

DSTREAM 不是 RDD 但RDDS的持续和潜在的无限序列。因为它不能算,它并不怎么计数方法的目的是工作。

相反,它把现有的流进另一个流,每个RDD


  

具有通过计算这DSTREAM

每个RDD产生一个单一的元素

如果你要执行个人RDDS一些动作,你应该使用 foreachRDD

In my application I have two JavaDStreams which contain some data. I am attempting to count the number of rows in each JavaDStream however the result I am receiving in the log isn't a number but rather a completely different object that its outputting to the log. What am I doing wrong here?

Code:

      //map score result set to tweets
    JavaDStream<Tuple5<Long, String, Float, Float, String>> result =
            scoredTweets.map(new ScoreTweetsFunction());

    //get extra elements
    JavaDStream<Tuple7<Long, String, String, String, String, String, String>> extra_elements =
            json.map(new GetExtraElements());

     //join elements with score result
    System.out.println("Number of Rows in extra elements RDD: " + extra_elements.count());
    System.out.println("Number of Rows in result RDD: " + result.count());

Output from Log:

Number of Rows in extra elements RDD: org.apache.spark.streaming.api.java.JavaDStream@73358a55
Number of Rows in result RDD: org.apache.spark.streaming.api.java.JavaDStream@242aa3b2

解决方案

DStream is not a RDD but a continuous and potentially infinite sequence of RDDs. Because of that it cannot be counted and it is not how count method is intended to work.

Instead it transforms existing stream into another stream where each RDD

has a single element generated by counting each RDD of this DStream

If you want to perform some action on individual RDDs you should use foreachRDD.

这篇关于星火流:Rdd.Count()没有返回一个有效的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆