Spark RDD到DataFrame python [英] Spark RDD to DataFrame python

查看:98
本文介绍了Spark RDD到DataFrame python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将Spark RDD转换为DataFrame.我已经看到了将方案传递到的文档和示例 sqlContext.CreateDataFrame(rdd,schema)功能.

I am trying to convert the Spark RDD to a DataFrame. I have seen the documentation and example where the scheme is passed to sqlContext.CreateDataFrame(rdd,schema) function.

但是我有38列或字段,这将进一步增加.如果我手动指定每个字段信息的架构,那将是一件繁琐的工作.

But I have 38 columns or fields and this will increase further. If I manually give the schema specifying each field information, that it going to be so tedious job.

还有其他方法可以指定模式,而不需要事先了解各列的信息.

Is there any other way to specify the schema without knowing the information of the columns prior.

推荐答案

请参见

有两种方法可以将Spark中的RDD转换为DF.

There are two ways to convert an RDD to DF in Spark.

toDF()createDataFrame(rdd, schema)

我将向您展示如何动态地做到这一点.

I will show you how you can do that dynamically.

toDF()命令为您提供了一种将RDD[Row]转换为数据框的方法.关键是,对象Row()可以接收**kwargs自变量.因此,有一种简单的方法可以做到这一点.

The toDF() command gives you the way to convert an RDD[Row] to a Dataframe. The point is, the object Row() can receive a **kwargs argument. So, there is an easy way to do that.

from pyspark.sql.types import Row

#here you are going to create a function
def f(x):
    d = {}
    for i in range(len(x)):
        d[str(i)] = x[i]
    return d

#Now populate that
df = rdd.map(lambda x: Row(**f(x))).toDF()

这样,您将能够动态创建数据框.

This way you are going to be able to create a dataframe dynamically.

执行此操作的另一种方法是创建动态架构.怎么样?

Other way to do that is creating a dynamic schema. How?

这种方式:

from pyspark.sql.types import StructType
from pyspark.sql.types import StructField
from pyspark.sql.types import StringType

schema = StructType([StructField(str(i), StringType(), True) for i in range(32)])

df = sqlContext.createDataFrame(rdd, schema)

第二种方法更清洁...

This second way is cleaner to do that...

因此,这就是动态创建数据框的方式.

So this is how you can create dataframes dynamically.

这篇关于Spark RDD到DataFrame python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆