Hadoop 无法连接到 Google Cloud Storage [英] Hadoop cannot connect to Google Cloud Storage

查看:42
本文介绍了Hadoop 无法连接到 Google Cloud Storage的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I'm trying to connect Hadoop running on Google Cloud VM to Google Cloud Storage. I have:

  • Modified the core-site.xml to include properties of fs.gs.impl and fs.AbstractFileSystem.gs.impl
  • Downloaded and referenced the gcs-connector-latest-hadoop2.jar in a generated hadoop-env.sh
  • authenticated via gcloud auth login using my personal account (instead of a service account).

I'm able to run gsutil -ls gs://mybucket/ without any issues but when I execute

hadoop fs -ls gs://mybucket/

I get the output:

14/09/30 23:29:31 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2 

ls: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token

Wondering what steps I am missing to get Hadoop to be able to see the Google Storage?

Thanks!

解决方案

By default, the gcs-connector when running on Google Compute Engine is optimized for using the built-in service-account mechanisms, so in order to force it to use the oauth2 flow, there are a few extra configuration keys that need to be set; you can borrow the same "client_id" and "client_secret" from gcloud auth as follows and add them to your core-site.xml, also disabling fs.gs.auth.service.account.enable:

<property>
  <name>fs.gs.auth.service.account.enable</name>
  <value>false</value>
</property>
<property>
  <name>fs.gs.auth.client.id</name>
  <value>32555940559.apps.googleusercontent.com</value>
</property>
<property>
  <name>fs.gs.auth.client.secret</name>
  <value>ZmssLNjJy2998hD4CTg2ejr2</value>
</property>

You can optionally also set fs.gs.auth.client.file to something other than its default of ~/.credentials/storage.json.

If you do this, then when you run hadoop fs -ls gs://mybucket you'll see a new prompt, similar to the "gcloud auth login" prompt, where you'll visit a browser and enter a verification code again. Unfortunately, the connector can't quite consume a "gcloud" generated credential directly, even though it can possibly share a credentialstore file, since it asks explicitly for the GCS scopes that it needs (you'll notice that the new auth flow will ask only for GCS scopes, as opposed to a big list of services like "gcloud auth login").

Make sure you've also set fs.gs.project.id in your core-site.xml:

<property>
  <name>fs.gs.project.id</name>
  <value>your-project-id</value>
</property>

since the GCS connector likewise doesn't automatically infer a default project from the related gcloud auth.

这篇关于Hadoop 无法连接到 Google Cloud Storage的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆