如何使用PySpark读取目录下的Parquet文件? [英] How to read Parquet files under a directory using PySpark?

查看：835 发布时间：2020/9/16 23:11:31 python pyspark apache-spark-sql databricks azure-databricks

本文介绍了如何使用PySpark读取目录下的Parquet文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已在线搜索，但在线提供的解决方案无法解决我的问题.我正在尝试读取分层目录下的实木复合地板文件.我收到以下错误.

I have searched online and the solutions provided online didn't resolve my issue. I am trying to read parquet files under a directory which are hierarchical. I am getting the following error.

'无法推断Parquet的架构.必须手动指定.'

'Unable to infer schema for Parquet. It must be specified manually.;'

我的目录结构如下: dbfs:/mnt/sales/region/country/2020/08/04

My directory structure looks like: dbfs:/mnt/sales/region/country/2020/08/04

year文件夹下的月份中将有多个子目录，几天内的月份中将具有后续的子目录.

There will be multiple sub-directories for months under the year folder and subsequent sub-directories under month for days.

我只想在销售级别阅读它们，这应该对我来说对所有地区都是如此，并且我尝试了以下两种代码，但它们都不起作用.请帮助我.

I only want to read them at the sales level which should give me for all the regions and I've tried both of the below codes but neither of them worked. Please help me with this.

spark.read.parquet("dbfs:/mnt/sales/*")

或

spark.read.parquet("dbfs:/mnt/sales/")

如何使用PySpark读取目录下的Parquet文件? [英] How to read Parquet files under a directory using PySpark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用PySpark读取目录下的Parquet文件? [英] How to read Parquet files under a directory using PySpark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭