在Apache Spark中访问以下划线开头的文件 [英] Access files that start with underscore in apache spark

查看：72 发布时间：2021/4/8 19:23:36 hadoop apache-spark

本文介绍了在Apache Spark中访问以下划线开头的文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试访问Apache Spark中以 _ 开头的s3上的gz文件.不幸的是，spark认为这些文件不可见，并返回输入路径不存在:s3n:.../_ 1013.gz .如果我删除下划线，它会找到文件.

I am trying to access gz files on s3 that start with _ in Apache Spark. Unfortunately spark deems these files invisible and returns Input path does not exist: s3n:.../_1013.gz. If I remove the underscore it finds the file just fine.

我尝试将自定义PathFilter添加到hadoopConfig:

I tried adding a custom PathFilter to the hadoopConfig:

package CustomReader

import org.apache.hadoop.fs.{Path, PathFilter}

class GFilterZip extends PathFilter {
  override def accept(path: Path): Boolean = {
    true
  }
}
// in spark settings
sc.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", classOf[CustomReader.GFilterZip], classOf[org.apache.hadoop.fs.PathFilter])

但是我仍然有同样的问题.有什么想法吗?

but I still have the same problem. Any ideas?

系统:带有Hadoop 2.3的Apache Spark 1.6.0

System: Apache Spark 1.6.0 with Hadoop 2.3

在Apache Spark中访问以下划线开头的文件 [英] Access files that start with underscore in apache spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Apache Spark中访问以下划线开头的文件 [英] Access files that start with underscore in apache spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭