加载Parquet文件时无法推断架构 [英] Unable to infer schema when loading Parquet file

查看：553 发布时间：2020/9/4 2:59:20 apache-spark pyspark parquet

本文介绍了加载Parquet文件时无法推断架构的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

response = "mi_or_chd_5"

outcome = sqlc.sql("""select eid,{response} as response
from outcomes
where {response} IS NOT NULL""".format(response=response))
outcome.write.parquet(response, mode="overwrite") # Success
print outcome.schema
StructType(List(StructField(eid,IntegerType,true),StructField(response,ShortType,true)))

但是然后:

outcome2 = sqlc.read.parquet(response)  # fail

失败:

AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'

在

/usr/local/lib/python2.7/dist-packages/pyspark-2.1.0+hadoop2.7-py2.7.egg/pyspark/sql/utils.pyc in deco(*a, **kw)

镶木地板的文档说格式是自描述的，保存镶木地板文件时可以使用完整的架构.有什么作用?

The documentation for parquet says the format is self describing, and the full schema was available when the parquet file was saved. What gives?

使用Spark 2.1.1.在2.2.0中也失败.

Using Spark 2.1.1. Also fails in 2.2.0.

找到了此错误报告，但已在中进行了修复 2.0.1，2.1.0.

Found this bug report, but was fixed in 2.0.1, 2.1.0.

更新:与master ="local"连接时可以正常工作，而与master ="mysparkcluster"连接时可以失败.

UPDATE: This work when on connected with master="local", and fails when connected to master="mysparkcluster".

加载Parquet文件时无法推断架构 [英] Unable to infer schema when loading Parquet file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

加载Parquet文件时无法推断架构 [英] Unable to infer schema when loading Parquet file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭