Spark RDD 到 DataFrame python [英] Spark RDD to DataFrame python

查看:27
本文介绍了Spark RDD 到 DataFrame python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 Spark RDD 转换为 DataFrame.我已经看到了将方案传递给的文档和示例sqlContext.CreateDataFrame(rdd,schema) 函数.

I am trying to convert the Spark RDD to a DataFrame. I have seen the documentation and example where the scheme is passed to sqlContext.CreateDataFrame(rdd,schema) function.

但我有 38 个列或字段,这将进一步增加.如果我手动给出指定每个字段信息的模式,那将是一项繁琐的工作.

But I have 38 columns or fields and this will increase further. If I manually give the schema specifying each field information, that it going to be so tedious job.

有没有其他方法可以在不知道列信息的情况下指定模式.

Is there any other way to specify the schema without knowing the information of the columns prior.

推荐答案

看,

在 Spark 中有两种方法可以将 RDD 转换为 DF.

There are two ways to convert an RDD to DF in Spark.

toDF()createDataFrame(rdd, schema)

我将向您展示如何动态地做到这一点.

I will show you how you can do that dynamically.

toDF() 命令为您提供了将 RDD[Row] 转换为 Dataframe 的方法.关键是,对象 Row() 可以接收一个 **kwargs 参数.所以,有一种简单的方法可以做到这一点.

The toDF() command gives you the way to convert an RDD[Row] to a Dataframe. The point is, the object Row() can receive a **kwargs argument. So, there is an easy way to do that.

from pyspark.sql.types import Row

#here you are going to create a function
def f(x):
    d = {}
    for i in range(len(x)):
        d[str(i)] = x[i]
    return d

#Now populate that
df = rdd.map(lambda x: Row(**f(x))).toDF()

通过这种方式,您将能够动态创建数据框.

This way you are going to be able to create a dataframe dynamically.

其他方法是创建动态模式.怎么样?

Other way to do that is creating a dynamic schema. How?

这样:

from pyspark.sql.types import StructType
from pyspark.sql.types import StructField
from pyspark.sql.types import StringType

schema = StructType([StructField(str(i), StringType(), True) for i in range(32)])

df = sqlContext.createDataFrame(rdd, schema)

第二种方法更简洁...

This second way is cleaner to do that...

这就是动态创建数据框的方法.

So this is how you can create dataframes dynamically.

这篇关于Spark RDD 到 DataFrame python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆