nltk PlaintextCorpusReader发送和paras函数不起作用 [英] nltk PlaintextCorpusReader sents and paras functions not working

查看：308 发布时间：2020/5/18 1:21:02 python python-3.x nltk

本文介绍了nltk PlaintextCorpusReader发送和paras函数不起作用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我无法在 PlaintextCorpusReader 中使用paras和sends函数.这是我的代码:

I cannot get the paras and sents function in the PlaintextCorpusReader to work. Here is the code I have:

import nltk
from nltk.corpus import PlaintextCorpusReader

corpus_root = './dir_root'
newcorpus = PlaintextCorpusReader(corpus_root, '.*') # Files you want to add

word_list = newcorpus.words('file1.txt')
sentence_list = newcorpus.sents('file1.txt')
paragraph_list = newcorpus.paras('file1.txt')

print(word_list)
print(sentence_list)
print(paragraph_list)

word_list很好.

word_list comes out fine.

['__________________________________________________________________', 'Title', ...]

但是，段落列表和句子列表都给出此错误:

But, paragraph_list and sentence_list both give this error:

    Traceback (most recent call last):
  File "corpus.py", line 13, in <module>
    print(sentence_list)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/collections.py", line 225, in __repr__
    for elt in self:
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/corpus/reader/util.py", line 296, in iterate_from
    tokens = self.read_block(self._stream)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/corpus/reader/plaintext.py", line 129, in _read_sent_block
    for sent in self._sent_tokenizer.tokenize(para)])
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 956, in __getattr__
    self.__load()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 948, in __load
    resource = load(self._path)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 808, in load
    opened_resource = _open(resource_url)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 926, in _open
    return find(path_, path + ['']).open()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 648, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource 'tokenizers/punkt/PY3/english.pickle' not found.
  Please use the NLTK Downloader to obtain the resource:  >>>
  nltk.download()
  Searched in:
    - '/Users/username/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

我尝试使用nltk.download()将文件下载到语料库，但这也不起作用.另外，由于PlaintextCorpusReader已经做到了，因此它似乎不应该工作. paras 和 sents 函数是PlaintextCorpusReader的一部分.我需要输入一个特定的fieldid吗?或者，是否需要某种正则表达式参数来查找句子或段落? 文档和源代码似乎并没有说需要更多比单词功能更强大.

I tried using the nltk.download() to download the file into the corpus, but that did not work either. Plus it did not seem like the way it should work since the PlaintextCorpusReader does that already. The paras and sents functions are apart of the PlaintextCorpusReader. Is there a particular fieldid I need to enter? Or, is there some sort of regex argument it requires to find the sentences or paragraphs? The documentation and source code does not seem to say it needs anything more than the words function does.

nltk PlaintextCorpusReader发送和paras函数不起作用 [英] nltk PlaintextCorpusReader sents and paras functions not working

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

nltk PlaintextCorpusReader发送和paras函数不起作用 [英] nltk PlaintextCorpusReader sents and paras functions not working

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭