Hadoop S3访问-FileSystem与FileContext [英] Hadoop S3 access - FileSystem vs FileContext

查看:82
本文介绍了Hadoop S3访问-FileSystem与FileContext的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法从HDFS FileContext对象访问S3文件系统",但是我可以使用FileSystem对象执行相同的操作.据我了解, FileContext 已取代了 FileSystem ,因此,如果我需要退回到使用 FileSystem ,似乎我做错了.我做错了吗?还是FileContext不如旧的FileSystem起作用?

I'm having trouble accessing the S3 "file-system" from the HDFS FileContext object, but I can use the FileSystem object to do the same. As I understand, FileContext has superseded FileSystem so it seems I'm doing it wrong if I need to fall back to using the FileSystem. Am I doing it wrong? Or is the FileContext not as functional as the older FileSystem?

我的功能(仅供参考-我正在Jupyter上运行此程序,将spark 2.1与hadoop 2.6.0-cdh5.5.1结合使用)

My functions (FYI - I'm running this from Jupyter, using spark 2.1 with hadoop 2.6.0-cdh5.5.1):

val hdfsConf = spark.sparkContext.hadoopConfiguration  
import _root_.org.apache.hadoop.conf.Configuration
import _root_.org.apache.hadoop.fs.{FileContext, Path, FileSystem}

def pathExistsFs(bucket:String, pStr:String): Boolean = {
  val p = new Path(pStr)
  val fs = FileSystem.get(new URI(s"s3a://$bucket"), spark.sparkContext.hadoopConfiguration)
  fs.exists(p)  
}

def pathExistsFc(bucket:String, pStr:String): Boolean = {
  val p = new Path(pStr)
  val fc = FileContext.getFileContext(new URI(s"s3a://$bucket"), 
  spark.sparkContext.hadoopConfiguration)
  fc.util().exists(p)
}

输出( pathExistsFs 有效, pathExistsFc 失败):

pathExistsF("myBucket", "myS3Key/path.txt") 
>>> res36_5: Boolean = true


pathExistsFc("myBucket", "myS3Key/path.txt") 
>>> org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a...


org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:451)
$sess.cmd37Wrapper$Helper$Hadoop$.pathExistsFc(cmd37.sc:14)
$sess.cmd42Wrapper$Helper.<init>(cmd42.sc:8)
$sess.cmd42Wrapper.<init>(cmd42.sc:686)
$sess.cmd42$.<init>(cmd42.sc:545)
$sess.cmd42$.<clinit>(cmd42.sc:-1)

谢谢!

推荐答案

使用FileSystem API;由于它具有低级特性,因此实际上是大多数S3性能开发人员去的地方.现在有一个从FileContext到S3AFileSystem的桥接类,但是显然您的CDH版本中没有.

Stay with the FileSystem APIs; because of it's low-level nature, it's actually where most of the S3 performance dev goes. There is now a bridge class from FileContext to the S3AFileSystem class, but that clearly isn't in your CDH version.

这篇关于Hadoop S3访问-FileSystem与FileContext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆