无法从我的笔记本计算机访问Bluemix对​​象存储 [英] Can't access Bluemix object store from my Notebook

查看:65
本文介绍了无法从我的笔记本计算机访问Bluemix对​​象存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python从Bluemix对​​象存储中读取几个JSON文件到Jupyter笔记本中.我遵循了找到的示例,但是仍然出现没有这样的文件或目录"错误.

I'm trying to read a couple of JSON files from my Bluemix object store into a Jupyter notebook using Python. I've followed the examples I've found, but I'm still getting a "No such file or directory" error.

以下是用于验证对象存储并标识文件的代码:

Here is the code that should authenticate the object store and identify the files:

# Set up Spark
from pyspark import SparkContext
from pyspark import SparkConf

if('config' not in globals()):
    config = SparkConf().setAppName('warehousing_sql').setMaster('local')
if('sc' not in globals()):
    sc= SparkContext(conf=config)

# Set the Hadoop configuration.
def set_hadoop_config(name, credentials):
    prefix = "fs.swift.service." + name
    hconf = sc._jsc.hadoopConfiguration()
    hconf.set(prefix + ".auth.url", credentials['auth_url']+'/v3/auth/tokens')
    hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
    hconf.set(prefix + ".tenant", credentials['project_id'])
    hconf.set(prefix + ".username", credentials['user_id'])
    hconf.set(prefix + ".password", credentials['password'])
    hconf.setInt(prefix + ".http.port", 8080)
    hconf.set(prefix + ".region", credentials['region'])
    hconf.setBoolean(prefix + ".public", True)

# Data Sources (generated by Insert to code)
    credentials = {
  'auth_url':'https://identity.open.softlayer.com',
  'project':'***',
  'project_id':'****',
  'region':'dallas',
  'user_id':'****',
  'domain_id':'****',
  'domain_name':'****',
  'username':'****',
  'password':"""****""",
  'filename':'Warehousing-data.json',
  'container':'notebooks',
  'tenantId':'****'
}

set_hadoop_config('spark', credentials)

# The data files should now be accessible through URLs of the form
# swift://notebooks.spark/filename.json

这是呼叫代码:

...
resource_path= "swift://notebooks.spark/"
Warehousing_data_json = "Warehousing-data.json"
Warehousing_sales_data_nominal_scenario_json = "Warehousing-sales_data-nominal_scenario.json"
...

这是错误: IOError:[Errno 2]没有这样的文件或目录:'swift://notebooks.spark/Warehousing-data.json'

Here is the error: IOError: [Errno 2] No such file or directory: 'swift://notebooks.spark/Warehousing-data.json'

很抱歉,这似乎是一个新手问题(我承认是),但我认为将其设置为一个无证方法SparkContext._jsc.hadoopConfiguration()实在是一件非常荒唐的事情.

I'm sorry if this seems like a novice question (which I admit I am), but I think it's ridiculously complicated to set this up and really bad form to rely on an undocumented method SparkContext._jsc.hadoopConfiguration().

在回应Hobert和Sven的评论时添加:

Added in response to Hobert's and Sven's comments:

感谢霍伯特.我不理解您对"swift://notebooks**.spark**/"的定义的评论,除非我误解了我所遵循的示例的逻辑(这与Sven在其回答中所显示的本质上是相同的),路径来自对sc._jsc.hadoopConfiguration()的调用,但由于未记录HadoopConfiguation类,因此很难知道此调用的实际作用.

Thanks Hobert. I don’t understand your comment about the definition for "swift://notebooks**.spark**/" Unless I misunderstand the logic of the sample I followed (which is essentially identical to what Sven shows in his response), this path results from the call to sc._jsc.hadoopConfiguration(), but it’s hard to know what this call actually does, since the HadoopConfiguation class is not documented.

我也不理解为Hadoop配置使用/添加该定义"或或者,…使用Spark内部的swift客户端访问JSON"的替代方案.我想我更喜欢后者,因为我在笔记本中没有其他使用Hadoop的了.请为我提供这些替代方法的更详细说明.

I also do not understand the alternatives to "use/add that definition for the Hadoop configuration" or "alternatively, … use swift client inside of Spark to access the JSON." I suppose I would prefer the latter since I make no other use of Hadoop in my notebook. Please point me to a more detailed explanation of these alternatives.

感谢Sven.您是正确的,我没有显示JSON文件的实际读取情况.读取实际上发生在 DOcplexcloud .这是我笔记本中的相关代码:

Thanks Sven. You are correct that I did not show the actual reading of the JSON files. The reading actually occurs within a method that is part of the API for DOcplexcloud. Here is the relevant code in my notebook:

resource_path= "swift://notebooks.spark/"
Warehousing_data_json = "Warehousing-data.json"
Warehousing_sales_data_nominal_scenario_json = "Warehousing-sales_data-nominal_scenario.json"

resp = client.execute(input= [{'name': "warehousing.mod",
                               'file': StringIO(warehousing_data_dotmod + warehousing_inputs + warehousing_dotmod + warehousing_outputs)},
                              {'name': Warehousing_data_json,
                               'filename': resource_path + Warehousing_data_json},
                              {'name': Warehousing_sales_data_nominal_scenario_json,
                               'filename': resource_path + Warehousing_sales_data_nominal_scenario_json}],
                      output= "results.json",
                      load_solution= True,
                      log= "solver.log",
                      gzip= True,
                      waittime= 300,
                      delete_on_completion= True)

这是堆栈跟踪:

IOError                                   Traceback (most recent call last)
<ipython-input-8-67cf709788b3> in <module>()
     29                       gzip= True,
     30                       waittime= 300,
---> 31                       delete_on_completion= True)
     32 
     33 result = WarehousingResult(json.loads(resp.solution.decode("utf-8")))

/gpfs/fs01/user/sbf1-4c17d3407da8d0-a7ea98a5cc6d/.local/lib/python2.7/site-packages/docloud/job.pyc in execute(self, input, output, load_solution, log, delete_on_completion, timeout, waittime, gzip, parameters)
    496         # submit job
    497         jobid = self.submit(input=input, timeout=timeout, gzip=gzip,
--> 498                             parameters=parameters)
    499         response = None
    500         completed = False

/gpfs/fs01/user/sbf1-4c17d3407da8d0-a7ea98a5cc6d/.local/lib/python2.7/site-packages/docloud/job.pyc in submit(self, input, timeout, gzip, parameters)
    436                                 gzip=gzip,
    437                                 timeout=timeout,
--> 438                                 parameters=parameters)
    439         # run model
    440         self.execute_job(jobid, timeout=timeout)

/gpfs/fs01/user/sbf1-4c17d3407da8d0-a7ea98a5cc6d/.local/lib/python2.7/site-packages/docloud/job.pyc in create_job(self, **kwargs)
    620                 self.upload_job_attachment(job_id, 
    621                                            attid=inp.name,
--> 622                                            data=inp.get_data(),
    623                                            gzip=gzip)
    624         return job_id

/gpfs/fs01/user/sbf1-4c17d3407da8d0-a7ea98a5cc6d/.local/lib/python2.7/site-packages/docloud/job.pyc in get_data(self)
    110         data = self.data
    111         if self.filename is not None:
--> 112             with open(self.filename, "rb") as f:
    113                 data = f.read()
    114         if self.file is not None:

IOError: [Errno 2] No such file or directory: 'swift://notebooks.spark/Warehousing-data.json'

当我在本地运行该笔记本时,该笔记本工作正常,而resource_path是我自己计算机上的路径.

This notebook works just fine when I run it locally and resource_path is a path on my own machine.

即使如此,您的代码似乎也与我的代码完全相同,并且代码紧跟我复制的示例,因此我不明白为什么您的代码有效而我的代码无效.

Sven, your code seems pretty much identical to what I have, and it follows closely the sample I copied, so I do not understand why yours works and mine doesn’t.

我已验证文件是否存在于我的Instance_objectstore中.因此,看来swift://notebooks.spark/没有指向此对象库.从一开始,这对我来说就是一个谜.同样,没有记录HadoopConfiguation类,因此不可能知道它如何使URL与对象库之间建立关联.

I have verified that the files are present on my Instance_objectstore. Therefore it seems that swift://notebooks.spark/ does not point to this objectstore. How that would happen has been a mystery to me from the start. Again, the HadoopConfiguation class is not documented, so it is not possible to know how it makes the association between the URL and the objectstore.

推荐答案

我在 https://github.com/saviosaldanha/IBM_Object_Store_Python_Example/blob/master/storage_recipe_example.py

这是修改后的代码:

import swiftclient
from keystoneclient import client

# Object Store credentials (generated by Insert to code)
credentials = {
  'auth_url':'https://identity.open.softlayer.com',
  'project':'***',
  'project_id':'***',
  'region':'dallas',
  'user_id':'***',
  'domain_id':'***',
  'domain_name':'***',
  'username':'***',
  'password':"""***""",
  'filename':'Warehousing-data.json',
  'container':'notebooks',
  'tenantId':'***'
}

# Establish Connection to Bluemix Object Store
connection = swiftclient.Connection(
    key=credentials[password],
    authurl=credentials[auth_url],
    auth_version='3',
    os_options={"project_id": credentials[projectId],
                "user_id": credentials[userId],
                "region_name": credentials[region]})

# The data files should now be accessible through calls of the form
# connection.get_object(credentials[container], fileName)[1]

然后通过以下方式访问文件:

Then the files are accessed as:

Warehousing_data_json = "Warehousing-data.json"
Warehousing_sales_data_nominal_scenario_json = "Warehousing-sales_data-nominal_scenario.json"

resp = client.execute(input= [{'name': "warehousing.mod",
                               'file': StringIO(warehousing_data_dotmod + warehousing_inputs + warehousing_dotmod + warehousing_outputs)},
                              {'name': Warehousing_data_json,
                               'filename': connection.get_object(credentials[container], Warehousing_data_json)[1]},
                              {'name': Warehousing_sales_data_nominal_scenario_json,
                               'filename': connection.get_object(credentials[container], Warehousing_sales_data_nominal_scenario_json)[1]}],
                              output= "results.json",
                              load_solution= True,
                              log= "solver.log",
                              gzip= True,
                              waittime= 300,
                              delete_on_completion= True)

问题是如何在Bluemix中加载库swiftclient和keystoneclient?点子似乎无法在笔记本电脑上工作.有人知道如何处理吗?

The problem is how to load the libraries swiftclient and keystoneclient in Bluemix? Pip doesn't seem to work in the notebook. Anyone know how to handle this?

这篇关于无法从我的笔记本计算机访问Bluemix对​​象存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆