有什么方法可以将Python的nltk.download('punkt')导入Google Cloud Functions? [英] Any way to import Python's nltk.download('punkt') into Google Cloud Functions?

查看:65
本文介绍了有什么方法可以将Python的nltk.download('punkt')导入Google Cloud Functions?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将Python的nltk.download('punkt')导入Google Cloud Functions?我发现将语句手动添加到main.py的代码块中会极大地减慢我的函数处理速度,因为每次运行时都必须下载punkt.是否有其他方法可以通过以其他方式调用punkt来消除这种情况?

Any way to import Python's nltk.download('punkt') into Google Cloud Functions? I've found that adding the statement manually into my code block in main.py significantly slows down my function processing, since punkt has to be downloaded every time it is run. Is there any method to eliminate this by calling punkt in some other way?

EDIT#1:-我编辑了代码和程序结构,以匹配Barak的建议,但我仍然遇到相同的错误:

EDIT#1:- I edited my code and program structure to match what Barak suggested, but I keep getting the same error:

Error: function terminated. Recommended action: inspect logs for termination reason. Details:

**********************************************************************
  Resource [93mpunkt[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('punkt')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtokenizers/punkt/PY3/english.pickle[0m

  Searched in:
    - '/tmp/nltk_data'
    - '/env/nltk_data'
    - '/env/share/nltk_data'
    - '/env/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

推荐答案

看看有关使用您的Cloud函数上传文件.具体来说,由于您可以上传文件,因此可以修改nltk以仅使用这些文件:

Take a look at the instructions for uploading files with your Cloud function. Specifically since you can upload files, you can then modify nltk to just use these files:

按照官方NLTK文档,您可以将NLTK_DATA环境变量设置为指向您的顶级nltk_data文件夹."

Following the official NLTK documentation, you can "Set your NLTK_DATA environment variable to point to your top level nltk_data folder."

将这些组合在一起,您将得到:

Combining these together, you'd get:

  1. 使用 python -m nltk.downloader punkt
  2. 下载数据(在您的计算机上)
  3. 将NLTK目录(在上述文档中的计算机上的路径)上载为在功能环境根目录下创建的 nltk_data 目录
  4. 配置代码以找到该文件夹​​:

  1. Download the data (on your computer) with python -m nltk.downloader punkt
  2. Upload the NLTK directory (find it's path on your computer in the above documentation) as an nltk_data directory, created at the root of your function environment
  3. Configure the code to find that folder:

import os
root = os.path.dirname(path.abspath(__file__))
nltk_dir = os.path.join(root, 'nltk_data')  # Your folder name here
os.environ['NLTK_DATA'] = nltk_dir

似乎使用环境变量导出路径并没有达到预期的效果,所以让我们在代码中明确显示路径

Seems as if path export with the environment variable doesn't achieve the desired effect, so let's have the path explicit in the code

  1. 在计算机上下载数据

  1. On your computer download the data

import os
download_dir = os.path.abspath('my_nltk_dir')
os.makedirs(download_dir)
import nltk
nltk.download('punkt', download_dir=download_dir)

  • 将目录 my_nltk_dir 添加到python脚本的同一文件夹中.这将是

  • Add the directory my_nltk_dir to be in the same folder of your python script. This would be

    PROJECT_ROOT/
    |-- my_code.py
    |-- my_nltk_dir/
        |-- ...
    

  • 在您的代码中使用引用数据

  • In your code refer to the data using

    import ntlk.data
    root = os.path.dirname(path.abspath(__file__))
    download_dir = os.path.join(root, 'my_nltk_dir')
    nltk.data.load(
        os.path.join(download_dir, 'tokenizers/punkt/english.pickle')
    )
    

  • 这篇关于有什么方法可以将Python的nltk.download('punkt')导入Google Cloud Functions?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆