我可以告诉spark.read.json我的文件已压缩吗? [英] Can I tell spark.read.json that my files are gzipped?

查看：90 发布时间：2021/4/8 19:39:17 apache-spark pyspark

本文介绍了我可以告诉spark.read.json我的文件已压缩吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个s3存储桶，其中包含将近100k压缩的JSON文件.

I have an s3 bucket with nearly 100k gzipped JSON files.

这些文件称为 [timestamp] .json ，而不是更明智的 [timestamp] .json.gz .

These files are called [timestamp].json instead of the more sensible [timestamp].json.gz.

我还有其他使用它们的进程，因此重命名不是一种选择，并且复制它们甚至不那么理想.

I have other processes that use them so renaming is not an option and copying them is even less ideal.

我正在使用 spark.read.json([pattern])读取这些文件.如果我将文件名重命名为包含 .gz ，则可以正常工作，但是扩展名仅为 .json ，因此无法读取它们.

I am using spark.read.json([pattern]) to read these files. If I rename the filename to contain the .gz this works fine, but whilst the extension is just .json they cannot be read.

有什么办法可以告诉Spark这些文件已压缩吗?

Is there any way I can tell spark that these files are gzipped?

我可以告诉spark.read.json我的文件已压缩吗? [英] Can I tell spark.read.json that my files are gzipped?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我可以告诉spark.read.json我的文件已压缩吗? [英] Can I tell spark.read.json that my files are gzipped?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭