Google Cloud上的Hive需要/ tmp上的权限,但无法更改权限 [英] Hive on Google Cloud wants permissions on /tmp, but no way to change permissions

查看:317
本文介绍了Google Cloud上的Hive需要/ tmp上的权限,但无法更改权限的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Google Cloud上运行Hive,其中Hadoop由点击部署的。 Hive似乎安装得很好,但是当我运行 hive 时,我得到以下错误输出:

 使用jar中的配置初始化日志记录:file:/home/michael_w_sherman_gmail_com/apache-hive-0.14.0-bin/l 
ib / hive-common-0.14.0.jar!/ hive- log4j.properties
SLF4J:类路径包含多个SLF4J绑定。
SLF4J:在[jar:file:/home/hadoop/hadoop-install/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!
/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J:在[jar:file:/home/michael_w_sherman_gmail_com/apache-hive-0.14.0-bin/lib/hive-jdbc- 0.14。
0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J:请参阅http://www.slf4j.org/codes.html#multiple_bindings以获取解释。
SLF4J:实际绑定类型为[org.slf4j.impl.Log4jLoggerFactory] ​​
线程main中的异常java.lang.RuntimeException:java.lang.RuntimeException:根临时目录:/ tmp / HDFS上的
配置单元应该是可写的。当前权限为:rwx ------
,位于org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)
位于org.apache.hadoop.hive .cli.CliDriver.run(CliDriver.java:672)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang .reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
导致:java.lang.RuntimeException:The HDFS上的root scratch dir:/ tmp / hive应该是可写的。目前每
任务是:rwx ------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:529)
at org。 apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:478)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:430)
... 7 more

我的第一个修复是检查 hdfs-site .xml 并更改 dfs.permissions.enabled 设置,但它已被设置为 false 。接下来,我尝试了 chmod 的权限。但是chmod的变化并不需要。


$ hadoop fs -ls

15/01/28 23:03:13 INFO gcs.GoogleHadoopFileSystemBase:GHFS版本:1.2.9-hadoop2



找到8个项目
....


drwx ------ - xxxx_gmail_com xxxx_gmail_com 0 2015-01-28 21:54 tmp

$ hadoop fs -chmod -R 777 / tmp

15/01/28 23:03:31信息gcs.GoogleHadoopFileSystemBase:GHFS版本:1.2.9-hadoop2

$ hadoop fs -ls
15/01/28 23:09:35信息gcs.GoogleHadoopFileSystemBase:GHFS版本:1.2.9-hadoop2



找到8项
....



drwx ------ - xxx_gmail_com xxx_gmail_com 0 2015-01 -28 21:54 tmp


不同的 chmod 选项,如 a + w ,无法更改权限。并且该文件的所有者/组始终与ssh用户相同(上面的日志来自从Google Cloud的控制台启动的ssh终端,该终端使用您的电子邮件作为用户名)。但我在使用ssh时遇到了同样的问题。



如何更改权限或让Hive不会出错?



谢谢。

解决方案

目前,Hadoop的GCS连接器不支持罚款 - 有损HDFS权限,因此报告的700是假;实际上,权限是通过 ACL 控制的,并且如果使用具有读/写的服务帐户访问,经过身份验证的GCE虚拟机中的任何Linux用户实际上都可以读/写/执行GCS中的所有文件。



Hive 0.14.0新鲜引入了一个不幸的尝试,以检查根目录上的最小权限733,即使它只是忽略了权限,可访问性也会很好。不幸的是,目前,Hive的SessionState中不能配置所需的权限,也不能在Hadoop的GCS连接器中进行配置;在未来的版本中,我们可以为GCS连接器提供配置设置,以便Hadoop指定要报告的权限,和/或在所有目录上实现完全细粒度的posix权限。



与此同时,Hive 0.13.0似乎没有相同的不幸检查,所以如果你对Hive版本略微老旧,它应该可以正常工作。



重要提示:也就是说,请注意,点击部署解决方案目前不支持Pig或Hive,部分原因是它尚未设置更多高级NFS一致性缓存在gcs-connector-1.3.0中引入/bdutil-0.36.4 ,并自动设置列表一致性缓存。 如果没有列表一致性缓存,Hive和Pig可能会意外丢失数据,因为它们依赖于ls来提交临时文件

下注实际上是下载最新的bdutil-1.1.0 并用它来代替;它支持Pig和Hive:

  ./ bdutil -e querytools deploy 

或等同:

  ./ bdutil -e扩展名/querytools/querytools_env.sh部署

querytools_env.sh 文件,你会发现:

 #要安装的tarball的URI。 
PIG_TARBALL_URI ='gs://querytools-dist/pig-0.12.0.tar.gz'
HIVE_TARBALL_URI ='gs://querytools-dist/hive-0.12.0-bin.tar。 gz'

您可以选择将自己的Hive版本上传到您自己的存储区并修改 HIVE_TARBALL_URI bdutil 选取它。 Hive 0.14.0仍然无法运行,但您可能会遇到Hive 0.13.0的运气。另外,如果您不太关心该版本,默认的Hive 0.12.0会接受来自Google工程团队的持续测试和验证,所以您将拥有更好的验证体验。如果您愿意,您还可以在 https://github.com/GoogleCloudPlatform/bdutil


I'm trying to run Hive on a Google Cloud, where Hadoop was installed by click-to-deploy. Hive seems to instal just fine, but when I run hive I get the following erroneous output:

Logging initialized using configuration in jar:file:/home/michael_w_sherman_gmail_com/apache-hive-0.14.0-bin/l
ib/hive-common-0.14.0.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-install/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!
/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/michael_w_sherman_gmail_com/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.
0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/
hive on HDFS should be writable. Current permissions are: rwx------
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current per
missions are: rwx------
        at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:529)
        at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:478)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:430)
        ... 7 more

My first fix was to check hdfs-site.xml and change the dfs.permissions.enabled setting, but it was already set to false. Next, I tried to chmod the permissions. But the chmod changes don't take.

$ hadoop fs -ls

15/01/28 23:03:13 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2

Found 8 items ....

drwx------ - xxxx_gmail_com xxxx_gmail_com 0 2015-01-28 21:54 tmp

$ hadoop fs -chmod -R 777 /tmp

15/01/28 23:03:31 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2

$ hadoop fs -ls 15/01/28 23:09:35 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2

Found 8 items ....

drwx------ - xxx_gmail_com xxx_gmail_com 0 2015-01-28 21:54 tmp

Different chmod options, like a+w, fail to change the permissions. And the owner/group of the file is always equal to the ssh user (the log above is from an ssh terminal launched from Google Cloud's console, which uses your email as a username). But I have the same problem when I ssh in.

How do I either change the permissions or get Hive to not give the error?

Thank you.

解决方案

For the time being, the GCS connector for Hadoop doesn't support fine-grained HDFS permissions, and thus the reported 700 is "fake"; in fact, permissions are controlled via ACLs, and if using the service account with read/write access, any linux user in the authenticated GCE VM is in fact able to read/write/execute on all files inside GCS.

It appears Hive 0.14.0 freshly introduces an unfortunate attempt to check for a minimum permission of 733 on the root dir, even though if it just ignored the permissions, accessibility would have worked out just fine. Unfortunately, for the moment, the "required permissions" isn't configurable in Hive's SessionState, nor is it configurable in the GCS connector for Hadoop; in a future release, we can potentially provide a config setting for the GCS connector for Hadoop to specify what permissions to report, and/or implement full fine-grained posix permissions on all directories.

In the meantime, it appears Hive 0.13.0 doesn't have the same unfortunate check, so if you're okay with the slightly older Hive version, it should work just fine.

Important: That said, note that the "click to deploy" solution doesn't currently officially support Pig or Hive, in part because it doesn't yet set up the more advanced "NFS consistency cache" introduced in gcs-connector-1.3.0/bdutil-0.36.4, with automated setup of the list-consistency cache. Without the list-consistency cache, Hive and Pig may unexpectedly lose data since they rely on "ls" to commit temporary files.

Your best bet is actually to download the latest bdutil-1.1.0 and use that instead; it supports Pig and Hive with:

./bdutil -e querytools deploy

or equivalently:

./bdutil -e extensions/querytools/querytools_env.sh deploy

Inside that querytools_env.sh file, you'll find:

# URIs of tarballs to install.
PIG_TARBALL_URI='gs://querytools-dist/pig-0.12.0.tar.gz'
HIVE_TARBALL_URI='gs://querytools-dist/hive-0.12.0-bin.tar.gz'

Where you may optionally upload your own Hive version to your own bucket and modify HIVE_TARBALL_URI for bdutil to pick it up. Hive 0.14.0 still won't work, but you might have luck with Hive 0.13.0. Alternatively, if you don't care about the version too much, the default Hive 0.12.0 receives continuous testing and validation from Google's engineering teams, so you'll have a better-validated experience. You can also view bdutil's contents on GitHub if you wish, at https://github.com/GoogleCloudPlatform/bdutil

这篇关于Google Cloud上的Hive需要/ tmp上的权限,但无法更改权限的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆