_spark_metadata导致问题 [英] _spark_metadata causing problems

查看：162 发布时间：2021/4/8 19:43:15 scala apache-spark spark-streaming

本文介绍了_spark_metadata导致问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Scala中使用Spark，并且我有一个目录，其中有多个文件.

I am using Spark with Scala and I have a directory where I have multiple files.

在此目录中，我有Spark生成的Parquet文件和Spark Streaming生成的其他文件.

In this directory I have Parquet files generated by Spark and other files generated by Spark Streaming.

Spark流式传输会生成目录 _spark_metadata .

And Spark streaming generates a directory _spark_metadata.

我面临的问题是，当我用Spark( sparksession.read.load )读取目录时，它仅读取Spark流式传输生成的数据，就像其他数据不存在一样.

The problem I am facing is when I read the directory with Spark (sparksession.read.load), it reads only the data generated by Spark streaming, like if the other data does not exist.

有人知道如何解决此问题，我认为应该有一个属性可以强制Spark忽略 spark_metadata 目录.

Does someone know how to resolve this issue, I think there should be a property to force Spark to ignore the spark_metadata directory.

谢谢您的帮助

推荐答案

我遇到了同样的问题(Spark 2.4.0)，我知道的唯一方法是使用遮罩/图案加载文件，例如这个

I have the same problem (Spark 2.4.0), and the only way I am aware of is to load the files using a mask/pattern, something like this

sparksession.read.format("parquet").load("/path/*.parquet")

据我所知无法忽略此目录.如果存在，Spark会考虑.

As far as I know there is no way to ignore this directory. If it exists, Spark will consider it.

这篇关于_spark_metadata导致问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

_spark_metadata导致问题 [英] _spark_metadata causing problems

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

_spark_metadata导致问题 [英] _spark_metadata causing problems

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭