在Docker上安装nltk [英] Install nltk on Docker

查看:299
本文介绍了在Docker上安装nltk的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是docker的新手,我正在尝试在docker上安装一些nltk软件包 这是我的Docker文件

I am new to docker, and I am trying to install some packages of nltk on docker Here is my docker file

FROM python:3-onbuild

RUN python -m libs.py

COPY start.sh /libs.py

COPY start.sh /start.sh

EXPOSE 8000

CMD ["/start.sh"]

这是我的libs.py,其中包含要下载的nltk软件包

Here is My libs.py which contain the packages of nltk to download

import nltk
nltk.data.path.append('./')
nltk.download('wordnet')
nltk.download('pros_cons')
nltk.download('snowball_data')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_ru')
nltk.download('punkt')
nltk.download('universal_tagset')
nltk.download('maxent_treebank_pos_tagger')
nltk.download('hmm_treebank_pos_tagger')
nltk.download('reuters')
nltk.download('treebank')
nltk.download('vader_lexicon')
nltk.download('porter_test')
nltk.download('rslp')

Docker Image创建成功,但是当我尝试使用这些软件包时,会抛出错误

Docker Image created successfully but when I try to use these packages it throwing me error

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/local/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

谁能说出为什么未安装nltk软件包?谢谢

Can anybody tell why the nltk packages not installed? thanks

推荐答案

似乎您必须在Docker内部创建一个用户.您应该尝试避免成为Docker的根用户(默认情况下).

It looks like you have to create a user inside Docker. You should try to avoid being root in Docker (by default).

尽管如此,使用<时,您可以设置 download_dir c1> :

Nevertheless you can set the download_dir when using nltk.download():

下载(自己, info_or_id =无, download_dir =无, quiet = False, force = False, prefix ='[nltk_data]', halt_on_error =正确, raise_on_error = False):

download(self, info_or_id=None, download_dir=None, quiet=False, force=False, prefix='[nltk_data] ', halt_on_error=True, raise_on_error=False):

如果未为download_dir设置任何值,则它将尝试将其保存为默认路径:

And if no value is set for download_dir, it will try to save it the default path:

    # decide where we're going to save things to.
    if self._download_dir is None:
        self._download_dir = self.default_download_dir()

更具体地说: https://github.com/nltk/nltk/blob/develop/nltk/downloader.py#L919

def default_download_dir(self):
    """
    Return the directory to which packages will be downloaded by
    default.  This value can be overridden using the constructor,
    or on a case-by-case basis using the ``download_dir`` argument when
    calling ``download()``.
    On Windows, the default download directory is
    ``PYTHONHOME/lib/nltk``, where *PYTHONHOME* is the
    directory containing Python, e.g. ``C:\\Python25``.
    On all other platforms, the default directory is the first of
    the following which exists or which can be created with write
    permission: ``/usr/share/nltk_data``, ``/usr/local/share/nltk_data``,
    ``/usr/lib/nltk_data``, ``/usr/local/lib/nltk_data``, ``~/nltk_data``.
    """

因此它将文件保存在/root/nltk_data/

运行docker镜像CMD ["/start.sh"]时似乎正在访问/目录,因此也许您对/root/nltk_data进行了一些权限设置.

It looks like you're accessing / directory when you run CMD ["/start.sh"] the docker image, so perhaps you have some permission settings with /root/nltk_data.

明确设置要下载nltk_data目录的路径:

Explicitly set the path where you want the nltk_data directory to be downloaded:

nltk.download('popular', download_dir='/path/to/nltk_data/')

运行新的python实例时,

When running a new python instance,

nltk.data.path.append('/path/to/nltk_data/')

另请参阅: 如何从代码配置nltk数据目录?

这篇关于在Docker上安装nltk的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆