如何阅读Spark编写的PySpark中的实木复合地板？ [英] How do I read a parquet in PySpark written from Spark?

查看：124 发布时间：2020/6/11 2:11:21 python scala apache-spark pyspark data-science-experience

本文介绍了如何阅读Spark编写的PySpark中的实木复合地板？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在分析中，我正在使用两个Jupyter笔记本做不同的事情。在我的Scala笔记本中，我将一些已清理的数据写到了实木复合地板上：

I am using two Jupyter notebooks to do different things in an analysis. In my Scala notebook, I write some of my cleaned data to parquet:

partitionedDF.select("noStopWords","lowerText","prediction").write.save("swift2d://xxxx.keystone/commentClusters.parquet")

然后我进入Python笔记本以读取数据：

I then go to my Python notebook to read in the data:

df = spark.read.load("swift2d://xxxx.keystone/commentClusters.parquet")

，我收到以下错误消息：

and I get the following error:

AnalysisException: u'Unable to infer schema for ParquetFormat at swift2d://RedditTextAnalysis.keystone/commentClusters.parquet. It must be specified manually;'

我看过spark文档，但我不认为应该要求指定一个架构。有没有人遇到过这样的事情？保存/加载时是否应该做其他事情？数据正在存储在对象存储中。

I have looked at the spark documentation and I don't think I should be required to specify a schema. Has anyone run into something like this? Should I be doing something else when I save/load? The data is landing in Object Storage.

编辑：
我在读取和写入中都唱spark 2.0。

edit: I'm sing spark 2.0 in both the read and the write.

edit2：
这是在Data Science Experience中的一个项目中完成的。

edit2: This was done in a project in Data Science Experience.

推荐答案

我以下列方式读取镶木地板文件：

I read parquet file in the following way:

from pyspark.sql import SparkSession
# initialise sparkContext
spark = SparkSession.builder \
    .master('local') \
    .appName('myAppName') \
    .config('spark.executor.memory', '5gb') \
    .config("spark.cores.max", "6") \
    .getOrCreate()

sc = spark.sparkContext

# using SQLContext to read parquet file
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

# to read parquet file
df = sqlContext.read.parquet('path-to-file/commentClusters.parquet')

这篇关于如何阅读Spark编写的PySpark中的实木复合地板？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何阅读Spark编写的PySpark中的实木复合地板？ [英] How do I read a parquet in PySpark written from Spark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何阅读Spark编写的PySpark中的实木复合地板？ [英] How do I read a parquet in PySpark written from Spark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭