Lambda不支持NLTK文件大小 [英] Lambda not supporting NLTK file size

查看:63
本文介绍了Lambda不支持NLTK文件大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Python脚本,该脚本分析一段文本并以JSON格式返回数据.我正在使用NLTK来分析数据.基本上,这是我的流程:

I am writing a python script that analyses a piece of text and returns the data in JSON format. I am using NLTK, to analyze the data. Basically, this is my flow:

创建端点(API网关)->调用我的lambda函数->返回所需数据的JSON.

Create an endpoint (API gateway) -> calls my lambda function -> returns JSON of required data.

我编写了脚本,并部署到了lambda上,但是遇到了这个问题:

I wrote my script, deployed to lambda but I ran into this issue:

资源\ u001b [93mpunkt \ u001b [0m未找到.请使用NLTK 下载器获取资源:

Resource \u001b[93mpunkt\u001b[0m not found. Please use the NLTK Downloader to obtain the resource:

\ u001b [31m >>>导入nltk nltk.download('punkt')\ u001b [0m
在以下位置搜索: -'/home/sbx_user1058/nltk_data' -'/usr/share/nltk_data' -'/usr/local/share/nltk_data' -'/usr/lib/nltk_data' -'/usr/local/lib/nltk_data' -'/var/lang/nltk_data' -'/var/lang/lib/nltk_data'

\u001b[31m>>> import nltk nltk.download('punkt') \u001b[0m
Searched in: - '/home/sbx_user1058/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/var/lang/nltk_data' - '/var/lang/lib/nltk_data'

即使下载了"punkt",我的脚本仍然给了我同样的错误.我在这里尝试了解决方案:

Even after downloading 'punkt', my script still gave me the same error. I tried the solutions here :

优化python脚本提取和处理大型数据文件

但是问题是nltk_data文件夹很大,而lambda有大小限制.

but the issue is, the nltk_data folder is huge, while lambda has a size restriction.

如何解决此问题? 或者我还能在哪里使用我的脚本并仍集成API调用?

How can I fix this issue? Or where else can I use my script and still integrate API call?

我正在使用无服务器来部署我的python脚本.

I am using serverless to deploy my python scripts.

推荐答案

您可以做两件事:

  1. 错误似乎未正确定义路径,也许将其设置为env变量?

sys.path.append(os.path.abspath('/var/task/nltk_data/')

或这种方式

  1. 一旦运行nltk.download(),然后将其复制到AWS lambda应用程序的根文件夹中. (将目录命名为"nltk_data".)

  1. Once you run nltk.download(), then copy it to the root folder of your AWS lambda application. (Name the dir to be called "nltk_data".)

在lambda函数仪表板(在AWS控制台中)中,添加NLTK_DATA = ./nltk_data作为键变量环境变量.

In the lambda function dashboard (in the AWS console), add NLTK_DATA=./nltk_data as a key-var Environment Variable.


  1. 减小nltk下载的大小,因为您不需要全部下载.

  1. reduce the size of the nltk downloads, since you won't be needing all of them.

  1. 删除所有zip文件,仅保留所需的部分,例如:停用词.可以将其移至:save nltk_data/corpora/stopwords并删除其余部分.

或者如果需要令牌生成器,请保存到nltk_data/tokenizers/punkt.其中大多数可以单独下载:python -m nltk.downloader punkt,然后复制文件.

Or If you need tokenizers save to nltk_data/tokenizers/punkt. Most of these can be separately downloaded: python -m nltk.downloader punkt, then copy over the files.

这篇关于Lambda不支持NLTK文件大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆