Lambda不支持NLTK文件大小 [英] Lambda not supporting NLTK file size
问题描述
我正在编写一个Python脚本,该脚本分析一段文本并以JSON格式返回数据.我正在使用NLTK来分析数据.基本上,这是我的流程:
I am writing a python script that analyses a piece of text and returns the data in JSON format. I am using NLTK, to analyze the data. Basically, this is my flow:
创建端点(API网关)->调用我的lambda函数->返回所需数据的JSON.
Create an endpoint (API gateway) -> calls my lambda function -> returns JSON of required data.
我编写了脚本,并部署到了lambda上,但是遇到了这个问题:
I wrote my script, deployed to lambda but I ran into this issue:
资源\ u001b [93mpunkt \ u001b [0m未找到.请使用NLTK 下载器获取资源:
Resource \u001b[93mpunkt\u001b[0m not found. Please use the NLTK Downloader to obtain the resource:
\ u001b [31m >>>导入nltk
nltk.download('punkt')\ u001b [0m
在以下位置搜索:
-'/home/sbx_user1058/nltk_data'
-'/usr/share/nltk_data'
-'/usr/local/share/nltk_data'
-'/usr/lib/nltk_data'
-'/usr/local/lib/nltk_data'
-'/var/lang/nltk_data'
-'/var/lang/lib/nltk_data'
\u001b[31m>>> import nltk
nltk.download('punkt') \u001b[0m
Searched in:
- '/home/sbx_user1058/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/var/lang/nltk_data'
- '/var/lang/lib/nltk_data'
即使下载了"punkt",我的脚本仍然给了我同样的错误.我在这里尝试了解决方案:
Even after downloading 'punkt', my script still gave me the same error. I tried the solutions here :
但是问题是nltk_data文件夹很大,而lambda有大小限制.
but the issue is, the nltk_data folder is huge, while lambda has a size restriction.
如何解决此问题? 或者我还能在哪里使用我的脚本并仍集成API调用?
How can I fix this issue? Or where else can I use my script and still integrate API call?
我正在使用无服务器来部署我的python脚本.
I am using serverless to deploy my python scripts.
推荐答案
您可以做两件事:
- 错误似乎未正确定义路径,也许将其设置为env变量?
sys.path.append(os.path.abspath('/var/task/nltk_data/')
或这种方式
-
一旦运行
nltk.download()
,然后将其复制到AWS lambda应用程序的根文件夹中. (将目录命名为"nltk_data".)
Once you run
nltk.download()
, then copy it to the root folder of your AWS lambda application. (Name the dir to be called "nltk_data".)
在lambda函数仪表板(在AWS控制台中)中,添加NLTK_DATA
= ./nltk_data
作为键变量环境变量.
In the lambda function dashboard (in the AWS console), add NLTK_DATA
=./nltk_data
as a key-var Environment Variable.
-
减小nltk下载的大小,因为您不需要全部下载.
reduce the size of the nltk downloads, since you won't be needing all of them.
-
删除所有zip文件,仅保留所需的部分,例如:停用词.可以将其移至:
save nltk_data/corpora/stopwords
并删除其余部分.
或者如果需要令牌生成器,请保存到nltk_data/tokenizers/punkt
.其中大多数可以单独下载:python -m nltk.downloader punkt
,然后复制文件.
Or If you need tokenizers save to nltk_data/tokenizers/punkt
. Most of these can be separately downloaded: python -m nltk.downloader punkt
, then copy over the files.
这篇关于Lambda不支持NLTK文件大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!