在DataFrames上执行RDD操作 [英] perform RDD operations on DataFrames
问题描述
map
, flatMap
等。 这是我的示例代码:
df.select(COUNTY,VEHICLES)。 ();
这是我的数据框
,我需要将此数据框
转换为 RDD
,并对此新的RDD进行一些RDD操作。
这里是如何将数据转换为RDD的代码
RDD< Row> java = df.select(COUNTY,VEHICLES)。rdd();
转换为RDD后,我无法看到RDD结果,我试过
java.collect();
java.take(10);
java.foreach();
在上述所有情况下,我未能获得结果。
请帮助我。
对于Spark 1.6:
当您将 Dataframe
转换为RDD时,您将无法看到结果,它将转换为进入 RDD [Row]
因此,当您尝试以下任一项时:
java.collect();
java.take(10);
java.foreach();
这将导致 Array [Row]
,并且您无法获得结果。
解决方案:
您可以将行转换为相应的值,并从中获取 RDD
,如下所示:
val newDF = df.select(COUNTY,VEHICLES)
val resultsRDD = newDF.rdd.map {row =>
val county = row.getAs [String](COUNTY)
val vehicles = row.getAs [String](VEHICLES)
(县,车辆)
}
现在,您可以应用 foreach
c code code code
PS:代码是用Scala ,但你可以获得我正在努力做的精髓!
I have a dataset of 10 fields. I need to perform RDD operations on these DataFrame. Is it possible to perform RDD operations like map
, flatMap
, etc..
here is my sample code:
df.select("COUNTY","VEHICLES").show();
this is my dataframe
and i need to convert this dataframe
to RDD
and operate some RDD operations on this new RDD.
Here is code how i am converted dataframe to RDD
RDD<Row> java = df.select("COUNTY","VEHICLES").rdd();
after converting to RDD, i am not able to see the RDD results, i tried
java.collect();
java.take(10);
java.foreach();
In all above cases i failed to get results.
please help me out.
For Spark 1.6 :
You won't be able to see the result's as when you are converting a Dataframe
to a RDD what it does is it converts it into RDD[Row]
And hence when you try any of these :
java.collect();
java.take(10);
java.foreach();
It would be resulting in Array[Row]
and you are not able to get the results.
Solution:
You can convert the Row to respective values and get the RDD
out of it like here :
val newDF=df.select("COUNTY","VEHICLES")
val resultantRDD=newDF.rdd.map{row=>
val county=row.getAs[String]("COUNTY")
val vehicles=row.getAs[String]("VEHICLES")
(county,vehicles)
}
And now you can apply the foreach
and collect
function to get the value.
P.S.: The code is written in Scala , but you can get the essence of what I am trying to do !
这篇关于在DataFrames上执行RDD操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!