在Docker上安装nltk [英] Install nltk on Docker
问题描述
我是docker的新手,我正在尝试在docker上安装一些nltk软件包 这是我的Docker文件
I am new to docker, and I am trying to install some packages of nltk on docker Here is my docker file
FROM python:3-onbuild
RUN python -m libs.py
COPY start.sh /libs.py
COPY start.sh /start.sh
EXPOSE 8000
CMD ["/start.sh"]
这是我的libs.py,其中包含要下载的nltk软件包
Here is My libs.py which contain the packages of nltk to download
import nltk
nltk.data.path.append('./')
nltk.download('wordnet')
nltk.download('pros_cons')
nltk.download('snowball_data')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_ru')
nltk.download('punkt')
nltk.download('universal_tagset')
nltk.download('maxent_treebank_pos_tagger')
nltk.download('hmm_treebank_pos_tagger')
nltk.download('reuters')
nltk.download('treebank')
nltk.download('vader_lexicon')
nltk.download('porter_test')
nltk.download('rslp')
Docker Image创建成功,但是当我尝试使用这些软件包时,会抛出错误
Docker Image created successfully but when I try to use these packages it throwing me error
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/local/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
谁能说出为什么未安装nltk软件包?谢谢
Can anybody tell why the nltk packages not installed? thanks
推荐答案
似乎您必须在Docker内部创建一个用户.您应该尝试避免成为Docker的根用户(默认情况下).
It looks like you have to create a user inside Docker. You should try to avoid being root in Docker (by default).
尽管如此,使用<时,您可以设置 download_dir
c1> :
Nevertheless you can set the download_dir
when using nltk.download()
:
下载(自己, info_or_id =无, download_dir =无, quiet = False, force = False, prefix ='[nltk_data]', halt_on_error =正确, raise_on_error = False):
download(self, info_or_id=None, download_dir=None, quiet=False, force=False, prefix='[nltk_data] ', halt_on_error=True, raise_on_error=False):
如果未为download_dir
设置任何值,则它将尝试将其保存为默认路径:
And if no value is set for download_dir
, it will try to save it the default path:
# decide where we're going to save things to.
if self._download_dir is None:
self._download_dir = self.default_download_dir()
更具体地说: https://github.com/nltk/nltk/blob/develop/nltk/downloader.py#L919
def default_download_dir(self):
"""
Return the directory to which packages will be downloaded by
default. This value can be overridden using the constructor,
or on a case-by-case basis using the ``download_dir`` argument when
calling ``download()``.
On Windows, the default download directory is
``PYTHONHOME/lib/nltk``, where *PYTHONHOME* is the
directory containing Python, e.g. ``C:\\Python25``.
On all other platforms, the default directory is the first of
the following which exists or which can be created with write
permission: ``/usr/share/nltk_data``, ``/usr/local/share/nltk_data``,
``/usr/lib/nltk_data``, ``/usr/local/lib/nltk_data``, ``~/nltk_data``.
"""
因此它将文件保存在/root/nltk_data/
运行docker镜像CMD ["/start.sh"]
时似乎正在访问/
目录,因此也许您对/root/nltk_data
进行了一些权限设置.
It looks like you're accessing /
directory when you run CMD ["/start.sh"]
the docker image, so perhaps you have some permission settings with /root/nltk_data
.
明确设置要下载nltk_data
目录的路径:
Explicitly set the path where you want the nltk_data
directory to be downloaded:
nltk.download('popular', download_dir='/path/to/nltk_data/')
运行新的python实例时,
When running a new python instance,
nltk.data.path.append('/path/to/nltk_data/')
另请参阅: 如何从代码配置nltk数据目录?
这篇关于在Docker上安装nltk的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!