无法从以下列表创建数据框:pyspark [英] Cannot create dataframe from list: pyspark
问题描述
我有一个由函数生成的列表.当我在列表上执行 print
时:
I have a list that is generated by a function. when I execute print
on my list:
print(preds_labels)
我获得:
[(0.,8.),(0.,13.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,20.),(0.,21.),(0.,23.)]
但是当我想使用此命令创建 DataFrame
时:
but when I want to create a DataFrame
with this command:
df = sqlContext.createDataFrame(preds_labels, ["prediction", "label"])
我收到一条错误消息:
不支持的类型:输入'numpy.float64'
not supported type: type 'numpy.float64'
如果我手动创建列表,则没有问题.你有主意吗?
If I create the list manually, I have no problem. Do you have an idea?
推荐答案
pyspark使用其自己的类型系统,不幸的是,它不能很好地处理numpy.它适用于python类型.因此,您可以将 numpy.float64
手动转换为 float
之类的
pyspark uses its own type system and unfortunately it doesn't deal with numpy well. It works with python types though. So you could manually convert the numpy.float64
to float
like
df = sqlContext.createDataFrame(
[(float(tup[0]), float(tup[1]) for tup in preds_labels],
["prediction", "label"]
)
注意pyspark然后会将它们作为 pyspark.sql.types.DoubleType
Note pyspark will then take them as pyspark.sql.types.DoubleType
这篇关于无法从以下列表创建数据框:pyspark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!