如何解决SPARK-5063中嵌套的地图功能 [英] How to solve SPARK-5063 in nested map functions

查看:1559
本文介绍了如何解决SPARK-5063中嵌套的地图功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

RDD变换和操作只能由驱动器被调用,而不是内其他的转换;例如,rdd1.map(X => rdd2.values​​.count()* x)是无效的,因为不能在rdd1.map变换的内部进行数值转换和计数动作。欲了解更多信息,请参阅SPARK-5063。

RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.

由于错误说,我试图映射(转换),主地图功能中的一个JavaRDD对象,怎么可能与Apache火花?

As the error says, i'm trying to map(transformation) a JavaRDD object within the main map function, how is it possible with Apache Spark?

主要JavaPairRDD对象(文本文件和Word中定义的类):

The main JavaPairRDD object (TextFile and Word are defined classes):

JavaPairRDD<TextFile, JavaRDD<Word>> filesWithWords = new...

和地图功能:

filesWithWords.map(textFileJavaRDDTuple2 -> textFileJavaRDDTuple2._2().map(word -> new Word(word.getText(), (long) textFileJavaRDDTuple2._1().getText().split(word.getText()).length)));

也是我想的foreach而不是地图功能,但无法正常工作。 (当然,搜索SPARK-5063)

also i tried foreach instead map function, but not working. (And of course searched SPARK-5063)

推荐答案

在上RDDS以同样的方式嵌套操作不支持嵌套RDD类型是不可能的星火。 RDDS仅在驱动程序在何处与定义,结合其 SparkContext 他们可以自己重新present数据调度操作。

In the same way nested operations on RDDs are not supported, nested RDD types are not possible in Spark. RDDs are only defined at the driver where, in combination with their SparkContext they can schedule operations on the data they represent.

因此​​,根本原因,我们需要在此情况下,解决的是数据类型:

So, the root cause we need to address in this case is the datatype:

JavaPairRDD<TextFile, JavaRDD<Word>> filesWithWords

其中在星火不会有任何可能的有效使用。根据不同的用例,这是不进一步在问题说明的,这种类型的应成为之一:

Which in Spark will have no possible valid use. Depending on the usecase, which is not further explained in the question, this type should become one of:

RDDS的集合,与文本文件,它们指的是:

A collection of RDDs, with the text file they refer to:

Map<TextFile,RDD<Word>>

或文本文件(文本文件,Word)中的集合:

Or a collection of (textFile,Word) by text file:

JavaPairRDD<TextFile, Word>

或单词及其相应的文本文件的集合:

Or a collection of words with their corresponding TextFile:

JavaPairRDD<TextFile, List<Word>>

一旦类型被纠正,嵌套RDD操作问题会自然解决了。

Once the type is corrected, the issues with the nested RDD operations will be naturally solved.

这篇关于如何解决SPARK-5063中嵌套的地图功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆