在DataFrames上执行RDD操作 [英] perform RDD operations on DataFrames

查看:225
本文介绍了在DataFrames上执行RDD操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个10个字段的数据集。我需要对这些DataFrame执行RDD操作。可以执行RDD操作,如 map flatMap 等。



这是我的示例代码:

  df.select(COUNTY,VEHICLES)。 (); 

这是我的数据框,我需要将此数据框转换为 RDD ,并对此新的RDD进行一些RDD操作。



这里是如何将数据转换为RDD的代码

  RDD< Row> java = df.select(COUNTY,VEHICLES)。rdd(); 

转换为RDD后,我无法看到RDD结果,我试过

  java.collect(); 
java.take(10);
java.foreach();

在上述所有情况下,我未能获得结果。



请帮助我。

解决方案

对于Spark 1.6:



当您将 Dataframe 转换为RDD时,您将无法看到结果,它将转换为进入 RDD [Row]



因此,当您尝试以下任一项时:

  java.collect(); 
java.take(10);
java.foreach();

这将导致 Array [Row] ,并且您无法获得结果。



解决方案:



您可以将行转换为相应的值,并从中获取 RDD ,如下所示:

  val newDF = df.select(COUNTY,VEHICLES)
val resultsRDD = newDF.rdd.map {row =>
val county = row.getAs [String](COUNTY)
val vehicles = row.getAs [String](VEHICLES)
(县,车辆)
}

现在,您可以应用 foreach c code code code

PS:代码是用Scala ,但你可以获得我正在努力做的精髓!


I have a dataset of 10 fields. I need to perform RDD operations on these DataFrame. Is it possible to perform RDD operations like map, flatMap, etc..

here is my sample code:

df.select("COUNTY","VEHICLES").show();

this is my dataframe and i need to convert this dataframe to RDD and operate some RDD operations on this new RDD.

Here is code how i am converted dataframe to RDD

 RDD<Row> java = df.select("COUNTY","VEHICLES").rdd();

after converting to RDD, i am not able to see the RDD results, i tried

java.collect();
java.take(10);
java.foreach();

In all above cases i failed to get results.

please help me out.

解决方案

For Spark 1.6 :

You won't be able to see the result's as when you are converting a Dataframe to a RDD what it does is it converts it into RDD[Row]

And hence when you try any of these :

java.collect();
java.take(10);
java.foreach();

It would be resulting in Array[Row] and you are not able to get the results.

Solution:

You can convert the Row to respective values and get the RDD out of it like here :

val newDF=df.select("COUNTY","VEHICLES")
val resultantRDD=newDF.rdd.map{row=>
val county=row.getAs[String]("COUNTY")
val vehicles=row.getAs[String]("VEHICLES")
(county,vehicles)
}

And now you can apply the foreach and collect function to get the value.

P.S.: The code is written in Scala , but you can get the essence of what I am trying to do !

这篇关于在DataFrames上执行RDD操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆