Dataproc 导入 python 模块存储在谷歌云存储 (gcs) 存储桶中 [英] Dataproc import python module stored in google cloud storage (gcs) bucket
问题描述
I have following structure on GCS bucket :
- my_bucket/notebooks/jupyter/
- modules
- mymodule.py
- init.py
- notebook_1.ipynb
- modules
How do i import mymodule in notebook_1.ipynb? (notebook_1.ipynb is a python notebook, NOT spark notebook)
I'm afraid it wouldn't be possible, since you need to have the module either in the directory where you are running the script or in your sys.path.
As an option, you can implement a function that would download the module from your Cloud Storage, use its functionality and then remove it.
Here is a simple example that I wrote for testing purposes:
greetings.py (the file that I stored in my bucket):
def say_hello(name):
return "Hello {}!".format(name)
def say_hi(name):
return "Hi {}!".format(name)
main.py:
from google.cloud import storage
import os
def get_module():
"""
Instantiate Storage Client and return the blob located in the bucket.
"""
client = storage.Client()
bucket = client.get_bucket('<my-bucket-name>')
return bucket.blob('greetings.py')
def use_my_module(my_method, val):
"""
Download the module, use it and then remove.
"""
blob = get_module()
blob.download_to_filename('my_module.py')
import my_module
result = getattr(my_module, my_method)(val)
os.remove('my_module.py')
return result
print(use_my_module('say_hello', 'User 1'))
print(use_my_module('say_hi', 'User 2'))
Output:
Hello User 1!
Hi User 2!
I cannot say if the example above is going to be efficient for your scenario but I hope that it will give you some ideas.
Edit:
Regarding the situation that your module is located in a sub-directory of the directory with your script (notebook_1.ipynb) - you can import the module like this:
import modules.mymodule
Then you can use it with the following structure:
modules.mymodule.<your-method>
这篇关于Dataproc 导入 python 模块存储在谷歌云存储 (gcs) 存储桶中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!