在Python中将NLTK语料库与AWS Lambda函数一起使用 [英] Using NLTK corpora with AWS Lambda functions in Python

查看:74
本文介绍了在Python中将NLTK语料库与AWS Lambda函数一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在AWS Lambda中使用NLTK语料库(特别是停用词)时遇到了困难.我知道需要下载语料库,并已使用NLTK.download('stopwords')进行了下载,并将其包含在zip文件中,该文件用于上传lambda模块到nltk_data/corpora/stopwords中.

I'm encountering a difficulty when using NLTK corpora (in particular stop words) in AWS Lambda. I'm aware that the corpora need to be downloaded and have done so with NLTK.download('stopwords') and included them in the zip file used to upload the lambda modules in nltk_data/corpora/stopwords.

代码中的用法如下:

from nltk.corpus import stopwords
stopwords = stopwords.words('english')
nltk.data.path.append("/nltk_data")

这将从Lambda日志输出中返回以下错误

This returns the following error from the Lambda log output

module initialization error: 
**********************************************************************
  Resource u'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/home/sbx_user1062/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/nltk_data'
**********************************************************************

我还试图通过包含直接加载数据

I have also tried to load the data directly by including

nltk.data.load("/nltk_data/corpora/stopwords/english")

下面会产生不同的错误

module initialization error: Could not determine format for file:///stopwords/english based on its file
extension; use the "format" argument to specify the format explicitly.

从Lambda zip加载数据可能有问题,需要将其存储在外部.例如在S3上说,但这似乎有些奇怪.

It's possible that it has a problem loading the data from the Lambda zip and needs it stored externally.. say on S3, but that seems a bit strange.

任何想法

有人知道我要去哪里错吗?

Does anyone know where I could be going wrong?

推荐答案

我之前遇到过同样的问题,但是我使用环境变量解决了.

I had the same problem before but I solved it using the environment variable.

  1. 执行"nltk.download()"并将其复制到您的AWS lambda应用程序的根文件夹中. (该文件夹应称为"nltk_data".)
  2. 在lambda函数的用户界面(在AWS控制台中)中,添加"NLTK_DATA" ="./nltk_data".请看图片.
  1. Execute "nltk.download()" and copy it to the root folder of your AWS lambda application. (The folder should be called "nltk_data".)
  2. In the user interface of your lambda function (in the AWS console), you add "NLTK_DATA" = "./nltk_data". Please see the image.

这篇关于在Python中将NLTK语料库与AWS Lambda函数一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆