Dataproc导入python模块存储在Google云存储(gcs)存储桶中 [英] Dataproc import python module stored in google cloud storage (gcs) bucket

查看:142
本文介绍了Dataproc导入python模块存储在Google云存储(gcs)存储桶中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在GCS存储桶上具有以下结构:

I have following structure on GCS bucket :

  1. my_bucket/笔记本/jupyter/
    • 模块
      • mymodule.py
      • 初始化 .py
  1. my_bucket/notebooks/jupyter/
    • modules
      • mymodule.py
      • init.py

如何将我的模块导入Notebook_1.ipynb? (notebook_1.ipynb是python笔记本,不是spark笔记本)

How do i import mymodule in notebook_1.ipynb? (notebook_1.ipynb is a python notebook, NOT spark notebook)

推荐答案

恐怕是不可能的,因为您需要在运行脚本的目录中或 sys.path .

I'm afraid it wouldn't be possible, since you need to have the module either in the directory where you are running the script or in your sys.path.

作为一种选择,您可以实现一个功能,该功能将从您的Cloud Storage中下载该模块,使用其功能,然后将其删除.

As an option, you can implement a function that would download the module from your Cloud Storage, use its functionality and then remove it.

这是我为测试目的编写的一个简单的示例:

Here is a simple example that I wrote for testing purposes:

greetings.py (我存储在存储桶中的文件):

greetings.py (the file that I stored in my bucket):

def say_hello(name):
    return "Hello {}!".format(name)


def say_hi(name):
    return "Hi {}!".format(name)

main.py :

from google.cloud import storage
import os


def get_module():
    """
    Instantiate Storage Client and return the blob located in the bucket.
    """
    client = storage.Client()
    bucket = client.get_bucket('<my-bucket-name>')
    return bucket.blob('greetings.py')


def use_my_module(my_method, val):
    """
    Download the module, use it and then remove.    
    """
    blob = get_module()
    blob.download_to_filename('my_module.py')
    import my_module

    result = getattr(my_module, my_method)(val)
    os.remove('my_module.py')
    return result


print(use_my_module('say_hello', 'User 1'))
print(use_my_module('say_hi', 'User 2'))

输出:

Hello User 1!
Hi User 2!


我不能说上面的例子对您的情况是否有效,但我希望它能给您一些想法.


I cannot say if the example above is going to be efficient for your scenario but I hope that it will give you some ideas.

关于使用脚本(notebook_1.ipynb)将模块位于目录的子目录中的情况-您可以像这样导入模块:

Regarding the situation that your module is located in a sub-directory of the directory with your script (notebook_1.ipynb) - you can import the module like this:

import modules.mymodule

然后您可以将其用于以下结构:

Then you can use it with the following structure:

modules.mymodule.<your-method>

这篇关于Dataproc导入python模块存储在Google云存储(gcs)存储桶中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆