如何创建一个空的DataFrame?为什么"ValueError:RDD为空"? [英] How to create an empty DataFrame? Why "ValueError: RDD is empty"?
问题描述
我正在尝试在Spark(Pyspark)中创建一个空的数据框.
I am trying to create an empty dataframe in Spark (Pyspark).
我正在使用与此处讨论的方法类似的方法在此处输入链接描述,但无法正常工作.
I am using similar approach to the one discussed here enter link description here, but it is not working.
这是我的代码
df = sqlContext.createDataFrame(sc.emptyRDD(), schema)
这是错误
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 404, in createDataFrame
rdd, schema = self._createFromRDD(data, schema, samplingRatio)
File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 285, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio)
File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 229, in _inferSchema
first = rdd.first()
File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/rdd.py", line 1320, in first
raise ValueError("RDD is empty")
ValueError: RDD is empty
推荐答案
扩展了Joe Widen的答案,您实际上可以创建没有任何字段的架构,如下所示:
extending Joe Widen's answer, you can actually create the schema with no fields like so:
schema = StructType([])
因此,当您使用Datap作为架构创建DataFrame时,将得到一个DataFrame[]
.
so when you create the DataFrame using that as your schema, you'll end up with a DataFrame[]
.
>>> empty = sqlContext.createDataFrame(sc.emptyRDD(), schema)
DataFrame[]
>>> empty.schema
StructType(List())
在Scala中,如果您选择使用sqlContext.emptyDataFrame
并检出该架构,它将返回StructType()
.
In Scala, if you choose to use sqlContext.emptyDataFrame
and check out the schema, it will return StructType()
.
scala> val empty = sqlContext.emptyDataFrame
empty: org.apache.spark.sql.DataFrame = []
scala> empty.schema
res2: org.apache.spark.sql.types.StructType = StructType()
这篇关于如何创建一个空的DataFrame?为什么"ValueError:RDD为空"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!