是否有可能建立在Apache的星火嵌套RDDS? [英] Is it possible to create nested RDDs in Apache Spark?

查看:174
本文介绍了是否有可能建立在Apache的星火嵌套RDDS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图实施星火K近邻算法。我不知道是否有可能使用嵌套RDD的工作。这将使我的生​​活轻松了许多。请看下面的code片段。

I am trying to implement K-nearest neighbor algorithm in Spark. I was wondering if it is possible to work with nested RDD's. This will make my life a lot easier. Consider the following code snippet.

public static void main (String[] args){
//blah blah code
JavaRDD<Double> temp1 = testData.map(
    new Function<Vector,Double>(){
        public Double call(final Vector z) throws Exception{
            JavaRDD<Double> temp2 = trainData.map(
                    new Function<Vector, Double>() {
                        public Double call(Vector vector) throws Exception {
                            return (double) vector.length();
                        }
                    }
            );
            return (double)z.length();
        }    
    }
);
}

目前我得到的错误与此嵌套的设置(我可以在这里发布完整的日志)。难道是在拳头地方允许吗?谢谢

Currently I am getting error with this nested settings (I can post here the full log). Is it allowed in the fist place? Thanks

推荐答案

没有,那是不可能的,因为RDD的项目必须是可序列化和RDD是不可串行化。这是有道理的,否则你可能会在网络上的整体RDD这是一个问题,如果它包含大量的数据传输。如果它不包含大量的数据,你可能,你应该使用数组或类​​似的东西。

No, it is not possible, because the items of an RDD must be serializable and a RDD is not serializable. And this makes sense, otherwise you might transfer over the network a whole RDD which is a problem if it contains a lot of data. And if it does not contain a lot of data, you might and you should use an array or something like it.

不过,我不知道你是如何实现的K近邻......但是要注意:如果你做类似计算每对夫妇点之间的距离,这其实不是在数据集大小可扩展,因为它是O(N2)。

However, I don't know how you are implementing the K-nearest neighbor...but be careful: if you do something like calculating the distance between each couple of point, this is actually not scalable in the dataset size, because it's O(n2).

这篇关于是否有可能建立在Apache的星火嵌套RDDS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆