Hadoop无法连接到Google云端存储 [英] Hadoop cannot connect to Google Cloud Storage

查看:241
本文介绍了Hadoop无法连接到Google云端存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将Google Cloud VM上运行的Hadoop连接到Google云端存储。我有:


  • 修改core-site.xml以包含fs.gs.impl和
    的属性fs.AbstractFileSystem。 gs.impl

  • 在生成的hadoop-env.sh中下载并引用
    gcs-connector-latest-hadoop2.jar

  • 使用我的个人帐户
    (而不是服务帐户)通过gcloud auth登录身份验证。



我可以运行gsutil -ls gs:// mybucket /没有任何问题,但执行时


hadoop fs -ls gs:// mybucket /

我得到了输出:

  14/09/30 23:29:31信息gcs.GoogleHadoopFileSystemBase:GHFS版本:1.2.9-hadoop2 

ls:从元数据服务器获取访问令牌时出错:http:// metadata / computeMetadata / v1 / instance / service-accounts / default / token

想知道我错过了什么步骤可以让Hadoop能够看到Google Storage?



谢谢!

解决方案

默认情况下,在Google Compute Engine上运行时,gcs连接器针对使用内置服务帐户机制,所以为了迫使它使用oauth2流程,需要设置一些额外的配置键;你可以像下面一样从gcloud auth中借用相同的client_id和client_secret,并将它们添加到你的core-site.xml中,同时禁用 fs.gs.auth.service.account.enable

 < property> 
<名称> fs.gs.auth.service.account.enable< /名称>
<值> false< /值>
< / property>
<属性>
< name> fs.gs.auth.client.id< / name>
<值> 32555940559.apps.googleusercontent.com< /值>
< / property>
<属性>
< name> fs.gs.auth.client.secret< / name>
<值> ZmssLNjJy2998hD4CTg2ejr2< /值>
< / property>

您也可以选择设置 fs.gs.auth.client.file 到默认值〜/ .credentials / storage.json 以外的其他值。



如果你这样做,那么当你运行 hadoop fs -ls gs:// mybucket 时,你会看到一个新的提示,类似于gcloud auth login提示符,其中您将访问浏览器并再次输入验证码。不幸的是,连接器不能直接使用生成的gcloud凭证,即使它可能共享凭证库文件,因为它明确要求它需要的GCS范围(您会注意到新的认证流程会询问仅用于GCS范围,而不是诸如gcloud auth login之类的大型服务列表)。



确保您还设置了 fs .gs.project.id 在你的core-site.xml中:

 < property> 
< name> fs.gs.project.id< / name>
<值> your-project-id< /值>
< / property>

,因为GCS连接器同样不会自动从相关的gcloud身份验证推断默认项目。 p>

I'm trying to connect Hadoop running on Google Cloud VM to Google Cloud Storage. I have:

  • Modified the core-site.xml to include properties of fs.gs.impl and fs.AbstractFileSystem.gs.impl
  • Downloaded and referenced the gcs-connector-latest-hadoop2.jar in a generated hadoop-env.sh
  • authenticated via gcloud auth login using my personal account (instead of a service account).

I'm able to run gsutil -ls gs://mybucket/ without any issues but when I execute

hadoop fs -ls gs://mybucket/

I get the output:

14/09/30 23:29:31 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2 

ls: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token

Wondering what steps I am missing to get Hadoop to be able to see the Google Storage?

Thanks!

解决方案

By default, the gcs-connector when running on Google Compute Engine is optimized for using the built-in service-account mechanisms, so in order to force it to use the oauth2 flow, there are a few extra configuration keys that need to be set; you can borrow the same "client_id" and "client_secret" from gcloud auth as follows and add them to your core-site.xml, also disabling fs.gs.auth.service.account.enable:

<property>
  <name>fs.gs.auth.service.account.enable</name>
  <value>false</value>
</property>
<property>
  <name>fs.gs.auth.client.id</name>
  <value>32555940559.apps.googleusercontent.com</value>
</property>
<property>
  <name>fs.gs.auth.client.secret</name>
  <value>ZmssLNjJy2998hD4CTg2ejr2</value>
</property>

You can optionally also set fs.gs.auth.client.file to something other than its default of ~/.credentials/storage.json.

If you do this, then when you run hadoop fs -ls gs://mybucket you'll see a new prompt, similar to the "gcloud auth login" prompt, where you'll visit a browser and enter a verification code again. Unfortunately, the connector can't quite consume a "gcloud" generated credential directly, even though it can possibly share a credentialstore file, since it asks explicitly for the GCS scopes that it needs (you'll notice that the new auth flow will ask only for GCS scopes, as opposed to a big list of services like "gcloud auth login").

Make sure you've also set fs.gs.project.id in your core-site.xml:

<property>
  <name>fs.gs.project.id</name>
  <value>your-project-id</value>
</property>

since the GCS connector likewise doesn't automatically infer a default project from the related gcloud auth.

这篇关于Hadoop无法连接到Google云端存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆