Hadoop无法连接到Google云端存储 [英] Hadoop cannot connect to Google Cloud Storage
问题描述
我试图将Google Cloud VM上运行的Hadoop连接到Google云端存储。我有:
- 修改core-site.xml以包含fs.gs.impl和
的属性fs.AbstractFileSystem。 gs.impl - 在生成的hadoop-env.sh中下载并引用
gcs-connector-latest-hadoop2.jar - 使用我的个人帐户
(而不是服务帐户)通过gcloud auth登录身份验证。
我可以运行gsutil -ls gs:// mybucket /没有任何问题,但执行时
hadoop fs -ls gs:// mybucket /
我得到了输出:
14/09/30 23:29:31信息gcs.GoogleHadoopFileSystemBase:GHFS版本:1.2.9-hadoop2
ls:从元数据服务器获取访问令牌时出错:http:// metadata / computeMetadata / v1 / instance / service-accounts / default / token
想知道我错过了什么步骤可以让Hadoop能够看到Google Storage?
谢谢!
默认情况下,在Google Compute Engine上运行时,gcs连接器针对使用内置服务帐户机制,所以为了迫使它使用oauth2流程,需要设置一些额外的配置键;你可以像下面一样从gcloud auth中借用相同的client_id和client_secret,并将它们添加到你的core-site.xml中,同时禁用 fs.gs.auth.service.account.enable
:
< property>
<名称> fs.gs.auth.service.account.enable< /名称>
<值> false< /值>
< / property>
<属性>
< name> fs.gs.auth.client.id< / name>
<值> 32555940559.apps.googleusercontent.com< /值>
< / property>
<属性>
< name> fs.gs.auth.client.secret< / name>
<值> ZmssLNjJy2998hD4CTg2ejr2< /值>
< / property>
您也可以选择设置 fs.gs.auth.client.file
到默认值〜/ .credentials / storage.json
以外的其他值。
如果你这样做,那么当你运行 hadoop fs -ls gs:// mybucket
时,你会看到一个新的提示,类似于gcloud auth login提示符,其中您将访问浏览器并再次输入验证码。不幸的是,连接器不能直接使用生成的gcloud凭证,即使它可能共享凭证库文件,因为它明确要求它需要的GCS范围(您会注意到新的认证流程会询问仅用于GCS范围,而不是诸如gcloud auth login之类的大型服务列表)。
确保您还设置了 fs .gs.project.id
在你的core-site.xml中:
< property>
< name> fs.gs.project.id< / name>
<值> your-project-id< /值>
< / property>
,因为GCS连接器同样不会自动从相关的gcloud身份验证推断默认项目。 p>
I'm trying to connect Hadoop running on Google Cloud VM to Google Cloud Storage. I have:
- Modified the core-site.xml to include properties of fs.gs.impl and fs.AbstractFileSystem.gs.impl
- Downloaded and referenced the gcs-connector-latest-hadoop2.jar in a generated hadoop-env.sh
- authenticated via gcloud auth login using my personal account (instead of a service account).
I'm able to run gsutil -ls gs://mybucket/ without any issues but when I execute
hadoop fs -ls gs://mybucket/
I get the output:
14/09/30 23:29:31 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2
ls: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token
Wondering what steps I am missing to get Hadoop to be able to see the Google Storage?
Thanks!
By default, the gcs-connector when running on Google Compute Engine is optimized for using the built-in service-account mechanisms, so in order to force it to use the oauth2 flow, there are a few extra configuration keys that need to be set; you can borrow the same "client_id" and "client_secret" from gcloud auth as follows and add them to your core-site.xml, also disabling fs.gs.auth.service.account.enable
:
<property>
<name>fs.gs.auth.service.account.enable</name>
<value>false</value>
</property>
<property>
<name>fs.gs.auth.client.id</name>
<value>32555940559.apps.googleusercontent.com</value>
</property>
<property>
<name>fs.gs.auth.client.secret</name>
<value>ZmssLNjJy2998hD4CTg2ejr2</value>
</property>
You can optionally also set fs.gs.auth.client.file
to something other than its default of ~/.credentials/storage.json
.
If you do this, then when you run hadoop fs -ls gs://mybucket
you'll see a new prompt, similar to the "gcloud auth login" prompt, where you'll visit a browser and enter a verification code again. Unfortunately, the connector can't quite consume a "gcloud" generated credential directly, even though it can possibly share a credentialstore file, since it asks explicitly for the GCS scopes that it needs (you'll notice that the new auth flow will ask only for GCS scopes, as opposed to a big list of services like "gcloud auth login").
Make sure you've also set fs.gs.project.id
in your core-site.xml:
<property>
<name>fs.gs.project.id</name>
<value>your-project-id</value>
</property>
since the GCS connector likewise doesn't automatically infer a default project from the related gcloud auth.
这篇关于Hadoop无法连接到Google云端存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!