AWS EMR上的YARN日志聚合 - UnsupportedFileSystemException [英] YARN log aggregation on AWS EMR - UnsupportedFileSystemException
问题描述
我努力为我的Amazon EMR群集启用YARN日志聚合。我正在关注这个配置文件:
在标题为使用AWS CLI在Amazon S3中汇总日志的部分下。
我已验证hadoop-config bootstrap动作将以下内容放在 yarn-site.xml中
<性><名称> yarn.log聚集启用< /名称><值>真< /值GT;< /性>
< property>< name> yarn.log-aggregation.retain-seconds< / name><值> -1< /值>< / property>
< property>< name> yarn.log-aggregation.retain-check-interval-seconds< / name>< value> 3000< / value>< / property>
< property>< name> yarn.nodemanager.remote-app-log-dir< / name><值> s3:// mybucket / logs< / value>< / property>
我可以运行一个示例作业( pi
从 hadoop-examples.jar
),并且看到它在ResourceManager的GUI上成功完成。
它甚至会创建一个使用应用程序ID命名的 s3:// mybucket / logs
下的文件夹。但是该文件夹是空的,如果我运行 yarn logs -applicationID< applicationId>
,我得到一个堆栈跟踪:
14/10/20 23:02:15 INFO client.RMProxy:连接到ResourceManager /10.XXX.XXX.XXX:9022
线程main中的异常org.apache.hadoop.fs.UnsupportedFileSystemException:No AbstractFileSystem for scheme:s3
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
at org.apache.hadoop。 fs.AbstractFileSystem.get(AbstractFileSystem.java:242)$ or $ $ b $ org.apache.hadoop.fs.FileContext $ 2.run(FileContext.java:333)
at org.apache.hadoop.fs.FileContext $ 2.run(FileContext.java:330)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
在org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext。org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
。 java:330)
at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388)
at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112)
at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java: 199)
这对我没有任何意义;我可以运行 hdfs dfs -ls s3:// mybucket /
,它会列出内容。这些计算机正在从AWS IAM角色获取凭据,我尝试添加fs.s3n.awsAccessKeyId并将其添加到 core-site.xml
中,但行为没有任何变化。
任何建议都非常感谢。
Hadoop提供了两个fs接口 - a href =https://hadoop.apache.org/docs/current/api/index.html?org/apache/hadoop/fs/FileSystem.html> FileSystem 和AbstractFileSystem 。大多数情况下,我们使用 FileSystem
并使用配置选项(如 fs.s3.impl
)来提供自定义适配器。然而,使用 AbstractFileSystem
接口
纱线记录
。
如果您可以找到S3的实现,可以使用 fs.AbstractFileSystem.s3.impl
。
请参阅 core-default.xml ,其中包含 fs.AbstractFileSystem.hdfs.impl
等示例。
I am struggling to enable YARN log aggregation for my Amazon EMR cluster. I am following this documentation for the configuration:
Under the section titled: "To aggregate logs in Amazon S3 using the AWS CLI".
I've verified that the hadoop-config bootstrap action puts the following in yarn-site.xml
<property><name>yarn.log-aggregation-enable</name><value>true</value></property>
<property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property>
<property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>3000</value></property>
<property><name>yarn.nodemanager.remote-app-log-dir</name><value>s3://mybucket/logs</value></property>
I can run a sample job (pi
from hadoop-examples.jar
) and see that it completed successfully on the ResourceManager's GUI.
It even creates a folder under s3://mybucket/logs
named with the application id. But the folder is empty, and if I run yarn logs -applicationID <applicationId>
, I get a stacktrace:
14/10/20 23:02:15 INFO client.RMProxy: Connecting to ResourceManager at /10.XXX.XXX.XXX:9022
Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:333)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:330)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388)
at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112)
at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199)
Which is doesn't make any sense to me; I can run hdfs dfs -ls s3://mybucket/
and it lists the contents just fine. The machines are getting credentials from AWS IAM Roles, I've tried adding fs.s3n.awsAccessKeyId and such to core-site.xml
with no change in behavior.
Any advice is much appreciated.
Hadoop provides two fs interfaces - FileSystem and AbstractFileSystem. Most of the time, we work with FileSystem
and use configuration options like fs.s3.impl
to provide custom adapters.
yarn logs
, however, uses the AbstractFileSystem
interface.
If you can find an implementation of that for S3, you can specify it using fs.AbstractFileSystem.s3.impl
.
See core-default.xml for examples of fs.AbstractFileSystem.hdfs.impl
etc.
这篇关于AWS EMR上的YARN日志聚合 - UnsupportedFileSystemException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!