无法从我的 Notebook 访问 Bluemix 对象存储 [英] Can't access Bluemix object store from my Notebook

查看:65
本文介绍了无法从我的 Notebook 访问 Bluemix 对象存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python 将几个 JSON 文件从我的 Bluemix 对象存储读取到 Jupyter 笔记本中.我已经按照我找到的示例进行操作,但仍然收到没有这样的文件或目录"错误.

I'm trying to read a couple of JSON files from my Bluemix object store into a Jupyter notebook using Python. I've followed the examples I've found, but I'm still getting a "No such file or directory" error.

以下是验证对象存储和识别文件的代码:

Here is the code that should authenticate the object store and identify the files:

# Set up Spark
from pyspark import SparkContext
from pyspark import SparkConf

if('config' not in globals()):
    config = SparkConf().setAppName('warehousing_sql').setMaster('local')
if('sc' not in globals()):
    sc= SparkContext(conf=config)

# Set the Hadoop configuration.
def set_hadoop_config(name, credentials):
    prefix = "fs.swift.service." + name
    hconf = sc._jsc.hadoopConfiguration()
    hconf.set(prefix + ".auth.url", credentials['auth_url']+'/v3/auth/tokens')
    hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
    hconf.set(prefix + ".tenant", credentials['project_id'])
    hconf.set(prefix + ".username", credentials['user_id'])
    hconf.set(prefix + ".password", credentials['password'])
    hconf.setInt(prefix + ".http.port", 8080)
    hconf.set(prefix + ".region", credentials['region'])
    hconf.setBoolean(prefix + ".public", True)

# Data Sources (generated by Insert to code)
    credentials = {
  'auth_url':'https://identity.open.softlayer.com',
  'project':'***',
  'project_id':'****',
  'region':'dallas',
  'user_id':'****',
  'domain_id':'****',
  'domain_name':'****',
  'username':'****',
  'password':"""****""",
  'filename':'Warehousing-data.json',
  'container':'notebooks',
  'tenantId':'****'
}

set_hadoop_config('spark', credentials)

# The data files should now be accessible through URLs of the form
# swift://notebooks.spark/filename.json

这是调用代码:

...
resource_path= "swift://notebooks.spark/"
Warehousing_data_json = "Warehousing-data.json"
Warehousing_sales_data_nominal_scenario_json = "Warehousing-sales_data-nominal_scenario.json"
...

错误如下:IOError: [Errno 2] 没有这样的文件或目录:'swift://notebooks.spark/Warehousing-data.json'

Here is the error: IOError: [Errno 2] No such file or directory: 'swift://notebooks.spark/Warehousing-data.json'

如果这看起来像一个新手问题(我承认我是),我很抱歉,但我认为设置这个问题非常复杂,而且依赖一个未记录的方法 SparkContext._jsc.hadoopConfiguration() 的形式非常糟糕.

I'm sorry if this seems like a novice question (which I admit I am), but I think it's ridiculously complicated to set this up and really bad form to rely on an undocumented method SparkContext._jsc.hadoopConfiguration().

为回应 Hobert 和 Sven 的评论而添加:

Added in response to Hobert's and Sven's comments:

谢谢霍伯特.我不明白您对swift://notebooks**.spark**/"定义的评论,除非我误解了我遵循的示例的逻辑(这与 Sven 在他的回复中显示的内容基本相同),这path 是调用 sc._jsc.hadoopConfiguration() 的结果,但很难知道这个调用实际上做了什么,因为 HadoopConfiguation 类没有记录.

Thanks Hobert. I don’t understand your comment about the definition for "swift://notebooks**.spark**/" Unless I misunderstand the logic of the sample I followed (which is essentially identical to what Sven shows in his response), this path results from the call to sc._jsc.hadoopConfiguration(), but it’s hard to know what this call actually does, since the HadoopConfiguation class is not documented.

我也不理解为 Hadoop 配置使用/添加该定义"或或者……使用 Spark 内部的 swift 客户端访问 JSON"的替代方案.我想我更喜欢后者,因为我没有在我的笔记本中使用 Hadoop.请向我指出这些替代方案的更详细说明.

I also do not understand the alternatives to "use/add that definition for the Hadoop configuration" or "alternatively, … use swift client inside of Spark to access the JSON." I suppose I would prefer the latter since I make no other use of Hadoop in my notebook. Please point me to a more detailed explanation of these alternatives.

谢谢斯文.你是对的,我没有显示 JSON 文件的实际读取.读取实际上发生在一个方法中,该方法是 DOcplexcloud.这是我笔记本中的相关代码:

Thanks Sven. You are correct that I did not show the actual reading of the JSON files. The reading actually occurs within a method that is part of the API for DOcplexcloud. Here is the relevant code in my notebook:

resource_path= "swift://notebooks.spark/"
Warehousing_data_json = "Warehousing-data.json"
Warehousing_sales_data_nominal_scenario_json = "Warehousing-sales_data-nominal_scenario.json"

resp = client.execute(input= [{'name': "warehousing.mod",
                               'file': StringIO(warehousing_data_dotmod + warehousing_inputs + warehousing_dotmod + warehousing_outputs)},
                              {'name': Warehousing_data_json,
                               'filename': resource_path + Warehousing_data_json},
                              {'name': Warehousing_sales_data_nominal_scenario_json,
                               'filename': resource_path + Warehousing_sales_data_nominal_scenario_json}],
                      output= "results.json",
                      load_solution= True,
                      log= "solver.log",
                      gzip= True,
                      waittime= 300,
                      delete_on_completion= True)

这是堆栈跟踪:

IOError                                   Traceback (most recent call last)
<ipython-input-8-67cf709788b3> in <module>()
     29                       gzip= True,
     30                       waittime= 300,
---> 31                       delete_on_completion= True)
     32 
     33 result = WarehousingResult(json.loads(resp.solution.decode("utf-8")))

/gpfs/fs01/user/sbf1-4c17d3407da8d0-a7ea98a5cc6d/.local/lib/python2.7/site-packages/docloud/job.pyc in execute(self, input, output, load_solution, log, delete_on_completion, timeout, waittime, gzip, parameters)
    496         # submit job
    497         jobid = self.submit(input=input, timeout=timeout, gzip=gzip,
--> 498                             parameters=parameters)
    499         response = None
    500         completed = False

/gpfs/fs01/user/sbf1-4c17d3407da8d0-a7ea98a5cc6d/.local/lib/python2.7/site-packages/docloud/job.pyc in submit(self, input, timeout, gzip, parameters)
    436                                 gzip=gzip,
    437                                 timeout=timeout,
--> 438                                 parameters=parameters)
    439         # run model
    440         self.execute_job(jobid, timeout=timeout)

/gpfs/fs01/user/sbf1-4c17d3407da8d0-a7ea98a5cc6d/.local/lib/python2.7/site-packages/docloud/job.pyc in create_job(self, **kwargs)
    620                 self.upload_job_attachment(job_id, 
    621                                            attid=inp.name,
--> 622                                            data=inp.get_data(),
    623                                            gzip=gzip)
    624         return job_id

/gpfs/fs01/user/sbf1-4c17d3407da8d0-a7ea98a5cc6d/.local/lib/python2.7/site-packages/docloud/job.pyc in get_data(self)
    110         data = self.data
    111         if self.filename is not None:
--> 112             with open(self.filename, "rb") as f:
    113                 data = f.read()
    114         if self.file is not None:

IOError: [Errno 2] No such file or directory: 'swift://notebooks.spark/Warehousing-data.json'

这个笔记本在我本地运行时运行良好,resource_path 是我自己机器上的路径.

This notebook works just fine when I run it locally and resource_path is a path on my own machine.

Sven,您的代码似乎与我的代码非常相似,并且与我复制的示例非常相似,所以我不明白为什么您的有效而我的无效.

Sven, your code seems pretty much identical to what I have, and it follows closely the sample I copied, so I do not understand why yours works and mine doesn’t.

我已验证文件存在于我的 Instance_objectstore 中.因此, swift://notebooks.spark/似乎没有指向这个对象存储.从一开始,这将如何发生对我来说一直是个谜.同样,HadoopConfiguation 类没有文档化,因此不可能知道它是如何在 URL 和对象存储之间建立关联的.

I have verified that the files are present on my Instance_objectstore. Therefore it seems that swift://notebooks.spark/ does not point to this objectstore. How that would happen has been a mystery to me from the start. Again, the HadoopConfiguation class is not documented, so it is not possible to know how it makes the association between the URL and the objectstore.

推荐答案

我在 https://developer.ibm.com/recipes/tutorials/using-ibm-object-storage-in-bluemix-with-python/https://github.com/saviosaldanha/IBM_Object_Store_Python_Example/blob/master/中的示例代码storage_recipe_example.py

这是修改后的代码:

import swiftclient
from keystoneclient import client

# Object Store credentials (generated by Insert to code)
credentials = {
  'auth_url':'https://identity.open.softlayer.com',
  'project':'***',
  'project_id':'***',
  'region':'dallas',
  'user_id':'***',
  'domain_id':'***',
  'domain_name':'***',
  'username':'***',
  'password':"""***""",
  'filename':'Warehousing-data.json',
  'container':'notebooks',
  'tenantId':'***'
}

# Establish Connection to Bluemix Object Store
connection = swiftclient.Connection(
    key=credentials[password],
    authurl=credentials[auth_url],
    auth_version='3',
    os_options={"project_id": credentials[projectId],
                "user_id": credentials[userId],
                "region_name": credentials[region]})

# The data files should now be accessible through calls of the form
# connection.get_object(credentials[container], fileName)[1]

然后文件被访问为:

Warehousing_data_json = "Warehousing-data.json"
Warehousing_sales_data_nominal_scenario_json = "Warehousing-sales_data-nominal_scenario.json"

resp = client.execute(input= [{'name': "warehousing.mod",
                               'file': StringIO(warehousing_data_dotmod + warehousing_inputs + warehousing_dotmod + warehousing_outputs)},
                              {'name': Warehousing_data_json,
                               'filename': connection.get_object(credentials[container], Warehousing_data_json)[1]},
                              {'name': Warehousing_sales_data_nominal_scenario_json,
                               'filename': connection.get_object(credentials[container], Warehousing_sales_data_nominal_scenario_json)[1]}],
                              output= "results.json",
                              load_solution= True,
                              log= "solver.log",
                              gzip= True,
                              waittime= 300,
                              delete_on_completion= True)

问题是如何在 Bluemix 中加载库 swiftclient 和 keystoneclient?Pip 在笔记本中似乎不起作用.有人知道怎么处理吗?

The problem is how to load the libraries swiftclient and keystoneclient in Bluemix? Pip doesn't seem to work in the notebook. Anyone know how to handle this?

这篇关于无法从我的 Notebook 访问 Bluemix 对象存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆