nltk数据无法在AWS实例类型为c4.xlarge的Ubuntu 14.04上安装 [英] nltk data fails to install on Ubuntu 14.04 of AWS instance type c4.xlarge

查看:103
本文介绍了nltk数据无法在AWS实例类型为c4.xlarge的Ubuntu 14.04上安装的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Ubuntu 14.04实例的AWS中测试安装脚本.实例类型c4.xlarge,使用EBS 50 GB. 每次安装时,我都会从创建的新实例开始进行测试.

I test installations script in Ubuntu 14.04 instance of AWS. Instance type c4.xlarge, using EBS 50 GB. Every time installing, I start testing from a new instance I create.

恒定地,nltk数据无法安装在panlex_lite软件包上.

Constantly the nltk Data fails to install on panlex_lite package.

有什么想法吗? (我在安装中附加了很多行,以标识我所看到的信息.对不起,冗长的清单.)

Any ideas ? (I attached a lot of lines from the installation to identify with the information I see. Sorry for the long lists).

谢谢

我在nltk数据之前执行的命令是:

The commands I do before the nltk data are:

sudo apt-get install python3-setuptools -y
sudo apt-get install python3.4-dev -y

# Installing Python packages
sudo easy_install3 pip
sudo easy_install3 inflect
sudo easy_install3 elasticsearch
sudo easy_install3 geopy
sudo easy_install3 geojson
sudo easy_install3 simplejson
sudo easy_install3 python_instagram
sudo easy_install3 flickrapi
sudo easy_install3 oauth
sudo easy_install3 xlrd
sudo easy_install3 pytz
sudo easy_install3 tweepy
sudo easy_install3 BeautifulSoup4
sudo easy_install3 psutil
sudo pip3 install -U nltk
sudo pip3 install -U numpy
sudo python3 -m nltk.downloader all

最后一行失败.从psutil的结尾开始,日志如下:

Last line fails. Log is the following starting from the finish of psutil:

Finished processing dependencies for psutil
sudo: unable to resolve host ip-172-30-0-207
The directory '/home/ubuntu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ubuntu/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting nltk
  Downloading nltk-3.1.tar.gz (1.1MB)
Installing collected packages: nltk
  Running setup.py install for nltk
Successfully installed nltk-3.1
sudo: unable to resolve host ip-172-30-0-207
The directory '/home/ubuntu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ubuntu/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting numpy
  Downloading numpy-1.10.1.tar.gz (4.0MB)
Installing collected packages: numpy
  Running setup.py install for numpy
Successfully installed numpy-1.10.1
sudo: unable to resolve host ip-172-30-0-207
[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/abc.zip.
[nltk_data]    | Downloading package alpino to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/alpino.zip.
[nltk_data]    | Downloading package biocreative_ppi to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/biocreative_ppi.zip.
[nltk_data]    | Downloading package brown to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/brown.zip.
[nltk_data]    | Downloading package brown_tei to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/brown_tei.zip.
[nltk_data]    | Downloading package cess_cat to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/cess_cat.zip.
[nltk_data]    | Downloading package cess_esp to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/cess_esp.zip.
[nltk_data]    | Downloading package chat80 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/chat80.zip.
[nltk_data]    | Downloading package city_database to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/city_database.zip.
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/cmudict.zip.
[nltk_data]    | Downloading package comparative_sentences to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/comparative_sentences.zip.
[nltk_data]    | Downloading package comtrans to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package conll2000 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/conll2000.zip.
[nltk_data]    | Downloading package conll2002 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/conll2002.zip.
[nltk_data]    | Downloading package conll2007 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package crubadan to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/crubadan.zip.
[nltk_data]    | Downloading package dependency_treebank to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/dependency_treebank.zip.
[nltk_data]    | Downloading package europarl_raw to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/europarl_raw.zip.
[nltk_data]    | Downloading package floresta to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/floresta.zip.
[nltk_data]    | Downloading package framenet_v15 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/framenet_v15.zip.
[nltk_data]    | Downloading package gazetteers to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/gazetteers.zip.
[nltk_data]    | Downloading package genesis to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/genesis.zip.
[nltk_data]    | Downloading package gutenberg to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/gutenberg.zip.
[nltk_data]    | Downloading package ieer to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/ieer.zip.
[nltk_data]    | Downloading package inaugural to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/inaugural.zip.
[nltk_data]    | Downloading package indian to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/indian.zip.
[nltk_data]    | Downloading package jeita to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package kimmo to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/kimmo.zip.
[nltk_data]    | Downloading package knbc to /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package lin_thesaurus to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/lin_thesaurus.zip.
[nltk_data]    | Downloading package mac_morpho to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/mac_morpho.zip.
[nltk_data]    | Downloading package machado to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package masc_tagged to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package moses_sample to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping models/moses_sample.zip.
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/movie_reviews.zip.
[nltk_data]    | Downloading package names to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/names.zip.
[nltk_data]    | Downloading package nombank.1.0 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package nps_chat to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/nps_chat.zip.
[nltk_data]    | Downloading package oanc_masc to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package omw to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/omw.zip.
[nltk_data]    | Downloading package opinion_lexicon to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/opinion_lexicon.zip.
[nltk_data]    | Downloading package paradigms to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/paradigms.zip.
[nltk_data]    | Downloading package pil to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/pil.zip.
[nltk_data]    | Downloading package pl196x to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/pl196x.zip.
[nltk_data]    | Downloading package ppattach to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/ppattach.zip.
[nltk_data]    | Downloading package problem_reports to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/problem_reports.zip.
[nltk_data]    | Downloading package propbank to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package ptb to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/ptb.zip.
[nltk_data]    | Downloading package oanc_masc to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Package oanc_masc is already up-to-date!
[nltk_data]    | Downloading package product_reviews_1 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/product_reviews_1.zip.
[nltk_data]    | Downloading package product_reviews_2 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/product_reviews_2.zip.
[nltk_data]    | Downloading package pros_cons to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/pros_cons.zip.
[nltk_data]    | Downloading package qc to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/qc.zip.
[nltk_data]    | Downloading package reuters to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package rte to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/rte.zip.
[nltk_data]    | Downloading package semcor to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package senseval to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/senseval.zip.
[nltk_data]    | Downloading package sentiwordnet to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/sentiwordnet.zip.
[nltk_data]    | Downloading package sentence_polarity to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/sentence_polarity.zip.
[nltk_data]    | Downloading package shakespeare to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/shakespeare.zip.
[nltk_data]    | Downloading package sinica_treebank to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/sinica_treebank.zip.
[nltk_data]    | Downloading package smultron to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/smultron.zip.
[nltk_data]    | Downloading package state_union to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/state_union.zip.
[nltk_data]    | Downloading package stopwords to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/stopwords.zip.
[nltk_data]    | Downloading package subjectivity to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/subjectivity.zip.
[nltk_data]    | Downloading package swadesh to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/swadesh.zip.
[nltk_data]    | Downloading package switchboard to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/switchboard.zip.
[nltk_data]    | Downloading package timit to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/timit.zip.
[nltk_data]    | Downloading package toolbox to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/toolbox.zip.
[nltk_data]    | Downloading package treebank to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/treebank.zip.
[nltk_data]    | Downloading package twitter_samples to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/twitter_samples.zip.
[nltk_data]    | Downloading package udhr to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/udhr.zip.
[nltk_data]    | Downloading package udhr2 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/udhr2.zip.
[nltk_data]    | Downloading package unicode_samples to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/unicode_samples.zip.
[nltk_data]    | Downloading package universal_treebanks_v20 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package verbnet to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/verbnet.zip.
[nltk_data]    | Downloading package webtext to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/webtext.zip.
[nltk_data]    | Downloading package wordnet to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/wordnet.zip.
[nltk_data]    | Downloading package wordnet_ic to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/wordnet_ic.zip.
[nltk_data]    | Downloading package words to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/words.zip.
[nltk_data]    | Downloading package ycoe to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/ycoe.zip.
[nltk_data]    | Downloading package rslp to /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping stemmers/rslp.zip.
[nltk_data]    | Downloading package hmm_treebank_pos_tagger to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping taggers/hmm_treebank_pos_tagger.zip.
[nltk_data]    | Downloading package maxent_treebank_pos_tagger to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping taggers/maxent_treebank_pos_tagger.zip.
[nltk_data]    | Downloading package universal_tagset to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping taggers/universal_tagset.zip.
[nltk_data]    | Downloading package maxent_ne_chunker to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data]    | Downloading package punkt to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping tokenizers/punkt.zip.
[nltk_data]    | Downloading package book_grammars to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping grammars/book_grammars.zip.
[nltk_data]    | Downloading package sample_grammars to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping grammars/sample_grammars.zip.
[nltk_data]    | Downloading package spanish_grammars to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping grammars/spanish_grammars.zip.
[nltk_data]    | Downloading package basque_grammars to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping grammars/basque_grammars.zip.
[nltk_data]    | Downloading package large_grammars to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping grammars/large_grammars.zip.
[nltk_data]    | Downloading package tagsets to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping help/tagsets.zip.
[nltk_data]    | Downloading package snowball_data to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package bllip_wsj_no_aux to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping models/bllip_wsj_no_aux.zip.
[nltk_data]    | Downloading package word2vec_sample to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping models/word2vec_sample.zip.
[nltk_data]    | Downloading package panlex_swadesh to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    | Downloading package mte_teip5 to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/mte_teip5.zip.
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data]    | Downloading package panlex_lite to
[nltk_data]    |     /home/ubuntu/nltk_data...
[nltk_data]    |   Unzipping corpora/panlex_lite.zip.

Error installing package. Retry? [n/y/e]

这也不是尺寸例外:

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/xvda1      51466360 6582776  42687092  14% /
none                   4       0         4   0% /sys/fs/cgroup
udev             3824796       8   3824788   1% /dev
tmpfs             765952     360    765592   1% /run
none                5120       0      5120   0% /run/lock
none             3829752       0   3829752   0% /run/shm
none              102400       0    102400   0% /run/user

推荐答案

在使用旧的

I came across the same issue when using an old AWS tutorial for sentiment analysis of tweet data. This tutorial uses a bootstrap script to install NLTK and its data with command on a EMR cluster,

$ sudo python -m nltk.downloader -d /usr/share/nltk_data all

在运行此命令时,我会得到与 panlex_lite 安装完全相同的问题.由于这是一个引导脚本,因此提示

On running this command I get the exact same issue of panlex_lite installation. Since this is a bootstrap script, the prompt

安装软件包时出错.重试? [n/y/e]

Error installing package. Retry? [n/y/e]

导致引导操作失败,并且EMR群集被终止. :P

causes the bootstrap action to fail and EMR cluster gets terminated. :P

我已经通过以下方式克服了这个问题 A)假定此软件包不是必不可少的 B)将命令修改为自动传递"n",以使脚本不会无限期等待.

I have overcome this by: A) assuming this package to be non essential B) modifying the command to, pass a 'n' automatically so the script does not wait indefinitely.

$ yes n | sudo python -m nltk.downloader -d /usr/share/nltk_data all

希望这会有所帮助.

2016年1月25日更新: 名为"panlex_lite"的数据集仍然会导致安装失败.

Update 25Jan2016: The data set named 'panlex_lite' still causes installation to fail.

这篇关于nltk数据无法在AWS实例类型为c4.xlarge的Ubuntu 14.04上安装的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆