AWS EMR上的YARN日志聚合 - UnsupportedFileSystemException [英] YARN log aggregation on AWS EMR - UnsupportedFileSystemException

查看:691
本文介绍了AWS EMR上的YARN日志聚合 - UnsupportedFileSystemException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我努力为我的Amazon EMR群集启用YARN日志聚合。我正在关注这个配置文件:



http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html#emr-plan-debugging-logs-归档



在标题为使用AWS CLI在Amazon S3中汇总日志的部分下。

我已验证hadoop-config bootstrap动作将以下内容放在 yarn-site.xml中

 <性><名称> yarn.log聚集启用< /名称><值>真< /值GT;< /性> 
< property>< name> yarn.log-aggregation.retain-seconds< / name><值> -1< /值>< / property>
< property>< name> yarn.log-aggregation.retain-check-interval-seconds< / name>< value> 3000< / value>< / property>
< property>< name> yarn.nodemanager.remote-app-log-dir< / name><值> s3:// mybucket / logs< / value>< / property>

我可以运行一个示例作业( pi hadoop-examples.jar ),并且看到它在ResourceManager的GUI上成功完成。



它甚至会创建一个使用应用程序ID命名的 s3:// mybucket / logs 下的文件夹。但是该文件夹是空的,如果我运行 yarn logs -applicationID< applicationId> ,我得到一个堆栈跟踪:

  14/10/20 23:02:15 INFO client.RMProxy:连接到ResourceManager /10.XXX.XXX.XXX:9022 
线程main中的异常org.apache.hadoop.fs.UnsupportedFileSystemException:No AbstractFileSystem for scheme:s3
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
at org.apache.hadoop。 fs.AbstractFileSystem.get(AbstractFileSystem.java:242)$ or $ $ b $ org.apache.hadoop.fs.FileContext $ 2.run(FileContext.java:333)
at org.apache.hadoop.fs.FileContext $ 2.run(FileContext.java:330)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
在org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext。org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
。 java:330)
at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388)
at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112)
at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java: 199)

这对我没有任何意义;我可以运行 hdfs dfs -ls s3:// mybucket / ,它会列出内容。这些计算机正在从AWS IAM角色获取凭据,我尝试添加fs.s3n.awsAccessKeyId并将其添加到 core-site.xml 中,但行为没有任何变化。



任何建议都非常感谢。

解决方案

Hadoop提供了两个fs接口 - a href =https://hadoop.apache.org/docs/current/api/index.html?org/apache/hadoop/fs/FileSystem.html> FileSystem 和AbstractFileSystem 。大多数情况下,我们使用 FileSystem 并使用配置选项(如 fs.s3.impl )来提供自定义适配器。然而,使用 AbstractFileSystem 接口

纱线记录



如果您可以找到S3的实现,可以使用 fs.AbstractFileSystem.s3.impl



请参阅 core-default.xml ,其中包含 fs.AbstractFileSystem.hdfs.impl 等示例。


I am struggling to enable YARN log aggregation for my Amazon EMR cluster. I am following this documentation for the configuration:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html#emr-plan-debugging-logs-archive

Under the section titled: "To aggregate logs in Amazon S3 using the AWS CLI".

I've verified that the hadoop-config bootstrap action puts the following in yarn-site.xml

<property><name>yarn.log-aggregation-enable</name><value>true</value></property>
<property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property>
<property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>3000</value></property>
<property><name>yarn.nodemanager.remote-app-log-dir</name><value>s3://mybucket/logs</value></property>

I can run a sample job (pi from hadoop-examples.jar) and see that it completed successfully on the ResourceManager's GUI.

It even creates a folder under s3://mybucket/logs named with the application id. But the folder is empty, and if I run yarn logs -applicationID <applicationId>, I get a stacktrace:

14/10/20 23:02:15 INFO client.RMProxy: Connecting to ResourceManager at /10.XXX.XXX.XXX:9022
Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:333)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:330)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
    at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322)
    at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
    at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388)
    at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112)
    at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
    at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199) 

Which is doesn't make any sense to me; I can run hdfs dfs -ls s3://mybucket/ and it lists the contents just fine. The machines are getting credentials from AWS IAM Roles, I've tried adding fs.s3n.awsAccessKeyId and such to core-site.xml with no change in behavior.

Any advice is much appreciated.

解决方案

Hadoop provides two fs interfaces - FileSystem and AbstractFileSystem. Most of the time, we work with FileSystem and use configuration options like fs.s3.impl to provide custom adapters.

yarn logs, however, uses the AbstractFileSystem interface.

If you can find an implementation of that for S3, you can specify it using fs.AbstractFileSystem.s3.impl.

See core-default.xml for examples of fs.AbstractFileSystem.hdfs.impl etc.

这篇关于AWS EMR上的YARN日志聚合 - UnsupportedFileSystemException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆