Hadoop 无法连接到 Google Cloud Storage [英] Hadoop cannot connect to Google Cloud Storage
问题描述
I'm trying to connect Hadoop running on Google Cloud VM to Google Cloud Storage. I have:
- Modified the core-site.xml to include properties of fs.gs.impl and fs.AbstractFileSystem.gs.impl
- Downloaded and referenced the gcs-connector-latest-hadoop2.jar in a generated hadoop-env.sh
- authenticated via gcloud auth login using my personal account (instead of a service account).
I'm able to run gsutil -ls gs://mybucket/ without any issues but when I execute
hadoop fs -ls gs://mybucket/
I get the output:
14/09/30 23:29:31 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2
ls: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token
Wondering what steps I am missing to get Hadoop to be able to see the Google Storage?
Thanks!
By default, the gcs-connector when running on Google Compute Engine is optimized for using the built-in service-account mechanisms, so in order to force it to use the oauth2 flow, there are a few extra configuration keys that need to be set; you can borrow the same "client_id" and "client_secret" from gcloud auth as follows and add them to your core-site.xml, also disabling fs.gs.auth.service.account.enable
:
<property>
<name>fs.gs.auth.service.account.enable</name>
<value>false</value>
</property>
<property>
<name>fs.gs.auth.client.id</name>
<value>32555940559.apps.googleusercontent.com</value>
</property>
<property>
<name>fs.gs.auth.client.secret</name>
<value>ZmssLNjJy2998hD4CTg2ejr2</value>
</property>
You can optionally also set fs.gs.auth.client.file
to something other than its default of ~/.credentials/storage.json
.
If you do this, then when you run hadoop fs -ls gs://mybucket
you'll see a new prompt, similar to the "gcloud auth login" prompt, where you'll visit a browser and enter a verification code again. Unfortunately, the connector can't quite consume a "gcloud" generated credential directly, even though it can possibly share a credentialstore file, since it asks explicitly for the GCS scopes that it needs (you'll notice that the new auth flow will ask only for GCS scopes, as opposed to a big list of services like "gcloud auth login").
Make sure you've also set fs.gs.project.id
in your core-site.xml:
<property>
<name>fs.gs.project.id</name>
<value>your-project-id</value>
</property>
since the GCS connector likewise doesn't automatically infer a default project from the related gcloud auth.
这篇关于Hadoop 无法连接到 Google Cloud Storage的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!