TypeError在Pyspark中将Pandas数据框转换为Spark数据框 [英] TypeError converting a Pandas Dataframe to Spark Dataframe in Pyspark

查看：748 发布时间：2020/5/24 3:39:59 python pandas apache-spark pyspark

本文介绍了TypeError在Pyspark中将Pandas数据框转换为Spark数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我做了研究，但没有发现任何东西.我想将简单的pandas.DataFrame转换为spark数据框，如下所示:

Did my research, but didn't find anything on this. I want to convert a simple pandas.DataFrame to a spark dataframe, like this:

df = pd.DataFrame({'col1': ['a', 'b', 'c'], 'col2': [1, 2, 3]})
sc_sql.createDataFrame(df, schema=df.columns.tolist())

我得到的错误是:

TypeError: Can not infer schema for type: <class 'str'>

我尝试了一些更简单的事情:

I tried something even simpler:

df = pd.DataFrame([1, 2, 3])
sc_sql.createDataFrame(df)

然后我得到:

TypeError: Can not infer schema for type: <class 'numpy.int64'>

有帮助吗?是否需要手动指定架构?

Any help? Do manually need to specify a schema or so?

sc_sql是pyspark.sql.SQLContext，我在python 3.4和spark 1.6上的jupyter笔记本中.

sc_sql is a pyspark.sql.SQLContext, I am in a jupyter notebook on python 3.4 and spark 1.6.

谢谢！

推荐答案

它与您的spark版本有关，最新的spark更新使类型推断更加智能.您可以通过添加以下模式来解决此问题:

It's related to your spark version, latest update of spark makes type inference more intelligent. You could have fixed this by adding the schema like this :

mySchema = StructType([ StructField("col1", StringType(), True), StructField("col2", IntegerType(), True)])
sc_sql.createDataFrame(df,schema=mySchema)

这篇关于TypeError在Pyspark中将Pandas数据框转换为Spark数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

TypeError在Pyspark中将Pandas数据框转换为Spark数据框 [英] TypeError converting a Pandas Dataframe to Spark Dataframe in Pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

TypeError在Pyspark中将Pandas数据框转换为Spark数据框 [英] TypeError converting a Pandas Dataframe to Spark Dataframe in Pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭