Dataproc 导入 python 模块存储在谷歌云存储 (gcs) 存储桶中 [英] Dataproc import python module stored in google cloud storage (gcs) bucket

查看:32
本文介绍了Dataproc 导入 python 模块存储在谷歌云存储 (gcs) 存储桶中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I have following structure on GCS bucket :

  1. my_bucket/notebooks/jupyter/
    • modules
      • mymodule.py
      • init.py
    • notebook_1.ipynb

How do i import mymodule in notebook_1.ipynb? (notebook_1.ipynb is a python notebook, NOT spark notebook)

解决方案

I'm afraid it wouldn't be possible, since you need to have the module either in the directory where you are running the script or in your sys.path.


As an option, you can implement a function that would download the module from your Cloud Storage, use its functionality and then remove it.

Here is a simple example that I wrote for testing purposes:

greetings.py (the file that I stored in my bucket):

def say_hello(name):
    return "Hello {}!".format(name)


def say_hi(name):
    return "Hi {}!".format(name)

main.py:

from google.cloud import storage
import os


def get_module():
    """
    Instantiate Storage Client and return the blob located in the bucket.
    """
    client = storage.Client()
    bucket = client.get_bucket('<my-bucket-name>')
    return bucket.blob('greetings.py')


def use_my_module(my_method, val):
    """
    Download the module, use it and then remove.    
    """
    blob = get_module()
    blob.download_to_filename('my_module.py')
    import my_module

    result = getattr(my_module, my_method)(val)
    os.remove('my_module.py')
    return result


print(use_my_module('say_hello', 'User 1'))
print(use_my_module('say_hi', 'User 2'))

Output:

Hello User 1!
Hi User 2!


I cannot say if the example above is going to be efficient for your scenario but I hope that it will give you some ideas.


Edit:

Regarding the situation that your module is located in a sub-directory of the directory with your script (notebook_1.ipynb) - you can import the module like this:

import modules.mymodule

Then you can use it with the following structure:

modules.mymodule.<your-method>

这篇关于Dataproc 导入 python 模块存储在谷歌云存储 (gcs) 存储桶中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆