将 JavaPairRDD 转换为 JavaRDD [英] Convert JavaPairRDD to JavaRDD
问题描述
我正在使用 ElasticSearch-Hadoop 库从 ElsticSearch 获取数据.
JavaPairRDD>esRDD = JavaEsSpark.esRDD(sc);
现在我有了 JavaPairRDD.我想在这个 RDD 上使用来自 MLLib 的随机森林.所以我将它转换为 JavaPairRDD.toRDD(esRDD) 这会给我 RDD.使用 RDD 我再次转换为 JavaRDD
JavaRDD[] splits = (JavaRDD.fromRDD(JavaPairRDD.toRDD(esRDD),esRDD.classTag())).randomSplit(new double[] { 0.5, 0.5 });JavaRDD训练数据 = 拆分 [0];JavaRDDtestData = splits[1];
我想将 trainingData 和 TestData 传递给随机森林算法,但它在编译时给出了转换异常.
<块引用><块引用>类型不匹配:不能从JavaRDD[Tuple2[String,Map[String,Object]]][] 到JavaRDD[LabeledPoint][]
添加方括号,因为小于和大于符号不起作用
谁能建议我正确的铸造方法.我是 Spark 数据结构的新手.
JavaPairRDD 列中有哪些数据?与普通 RDD 不同,JavaPairRDD 是第一列和第二列之间的键/值映射.
您可能希望从 JavaPairRDD 中删除第一列,只返回 JavaRDD 和值列.
为此,只需运行以下内容:
JavaRDD newRDD = esRDD.map(x => x._2);
或等同于创建一个没有第一列的新 JavaRDD.
I am fetching data from ElsticSearch using ElasticSearch-Hadoop Library.
JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(sc);
Now I have JavaPairRDD. I want to use Random Forest from MLLib on this RDD. So I am converting it to JavaPairRDD.toRDD(esRDD) this will give me RDD. Using RDD I am converting again to JavaRDD
JavaRDD<LabeledPoint>[] splits = (JavaRDD.fromRDD(JavaPairRDD.toRDD(esRDD),
esRDD.classTag())).randomSplit(new double[] { 0.5, 0.5 });
JavaRDD<LabeledPoint> trainingData = splits[0];
JavaRDD<LabeledPoint> testData = splits[1];
I want to pass trainingData and TestData to Random Forest algorithm but it gives casting exception at compile time.
Type mismatch: cannot convert from JavaRDD[Tuple2[String,Map[String,Object]]][] to JavaRDD[LabeledPoint][]
Added square brackets as less than and greater than signs are not working
Could any one suggest me the proper way for Casting. I am new to Spark Datastrucutres.
What data do you have in the JavaPairRDD columns? A JavaPairRDD is a key/value mapping between the first and second column, unlike a normal RDD.
You possibly want to drop off the first column from the JavaPairRDD, returning just JavaRDD with just the value column.
To to this, simply run something like:
JavaRDD newRDD = esRDD.map(x => x._2);
or equivalent to create a new JavaRDD without the first column.
这篇关于将 JavaPairRDD 转换为 JavaRDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!