从默认〜/ntlk_data更改nltk.download()路径目录 [英] Change nltk.download() path directory from default ~/ntlk_data

查看:97
本文介绍了从默认〜/ntlk_data更改nltk.download()路径目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在计算服务器上下载/更新python nltk软件包,但它返回此[Errno 122] Disk quota exceeded:错误.

I was trying to download/update python nltk packages on a computing server and it returned this [Errno 122] Disk quota exceeded: error.

特别是:

[nltk_data] Downloading package stop words to /home/sh2264/nltk_data...
[nltk_data] Error downloading u'stopwords' from
[nltk_data] <https://raw.githubusercontent.com/nltk/nltk_data/gh-
[nltk_data] pages/packages/corpora/stopwords.zip>: [Errno 122]
[nltk_data] Disk quota exceeded:
[nltk_data] u'/home/sh2264/nltk_data/corpora/stopwords.zip
False

如何更改nltk软件包的整个路径,以及应进行哪些其他更改以确保无误加载nltk?

How could I change the entire path for nltk packages, and what other changes should I make to ensure errorless loading of nltk?

推荐答案

这可以通过命令行(nltk.download(..., download_dir=)或GUI)进行配置.奇怪的是nltk似乎完全忽略了它自己的环境变量NLTK_DATA并默认使用它将目录下载到标准的五个路径集,无论是否定义了NLTK_DATA及其指向的位置,以及是否在计算机或体系结构上甚至都存在nltk的五个默认目录(!). "http://www.nltk.org/data.html" rel ="noreferrer">安装NLTK数据,尽管它不完整并且有点埋藏;下面以更清晰的格式重现:

This can be configured both by command-line (nltk.download(..., download_dir=) or by GUI. Bizarrely nltk seems to totally ignore its own environment variable NLTK_DATA and default its download directories to a standard set of five paths, regardless whether NLTK_DATA is defined and where it points, and regardless whether nltk's five default dirs even exist on the machine or architecture(!). Some of that is documented in Installing NLTK Data, although it's incomplete and kinda buried; reproduced below with much clearer formatting:

命令行安装

下载器将搜索现有的nltk_data目录以 安装NLTK数据.如果不存在,它将尝试创建一个 在中央位置(使用管理员帐户时)或 否则位于用户的文件空间中.如有必要,运行下载 命令从管理员帐户,或使用sudo.推荐的 系统位置是:

Command line installation

The downloader will search for an existing nltk_data directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location is:

  • C:\nltk_data(Windows);
  • /usr/local/share/nltk_data(Mac)和
  • /usr/share/nltk_data(Unix).
  • C:\nltk_data (Windows) ;
  • /usr/local/share/nltk_data (Mac) and
  • /usr/share/nltk_data (Unix).

您可以使用-d标志指定其他位置(但是,如果要执行此操作,请确保相应地设置NLTK_DATA环境变量).

You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly).

  • 运行命令python -m nltk.downloader all

要确保集中安装,请运行以下命令:sudo python -m nltk.downloader -d /usr/local/share/nltk_data all

To ensure central installation, run the command: sudo python -m nltk.downloader -d /usr/local/share/nltk_data all

但实际上他们应该说:sudo python -m nltk.downloader -d $NLTK_DATA all

But really they should say: sudo python -m nltk.downloader -d $NLTK_DATA all

现在 NLTK_DATA应该使用什么推荐路径,nltk并没有给出任何适当的指导,但是它应该是通用的独立路径,不在任何安装树下(因此不在<python-install-directory>/lib/site-packages下) )或任何用户目录.因此,/usr/local/share/opt/share或类似名称.在MacOS 10.7+上,/usr/usr/local/这些天默认情况下是隐藏的,因此/opt/share可能是一个更好的选择.或执行chflags nohidden /usr/local/share.

Now as to what recommended path NLTK_DATA should use, nltk doesn't really give any proper guidance, but it should be a generic standalone path not under any install tree (so not under <python-install-directory>/lib/site-packages) or any user dir. Hence, /usr/local/share, /opt/share or similar. On MacOS 10.7+, /usr and thus /usr/local/ these days are hidden by default, so /opt/share may well be a better choice. Or do chflags nohidden /usr/local/share.

这篇关于从默认〜/ntlk_data更改nltk.download()路径目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆