Stanford NLP Tagger通过NLTK-tag_sents将所有内容拆分为字符 [英] Stanford NLP Tagger via NLTK - tag_sents splits everything into chars

查看：115 发布时间：2020/5/18 0:43:09 python nlp nltk stanford-nlp

本文介绍了Stanford NLP Tagger通过NLTK-tag_sents将所有内容拆分为字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望有人对此有所了解，因为除了2015年关于NERtagger的错误报告外，我无法在线找到任何评论.

I'm hoping someone has experience with this as I'm unable to find any comments online besides a bug report from 2015 regarding the NERtagger which is probably the same.

无论如何，我正在尝试批处理文本，以避开效果不佳的基本标记器.据我了解，tag_sents应该会有所帮助.

Anyway, I'm trying to batch process text to get around the poor performing base tagger. From what I understand, tag_sents should help.

from nltk.tag.stanford import StanfordPOSTagger
from nltk import word_tokenize
import nltk

stanford_model = 'stanford-postagger/models/english-bidirectional-distsim.tagger'
stanford_jar = 'stanford-postagger/stanford-postagger.jar'
tagger = StanfordPOSTagger(stanford_model, stanford_jar)
tagger.java_options = '-mx4096m'
text = "The quick brown fox jumps over the lazy dog."
print tagger.tag_sents(text)

除了我传递给tag_sents方法的内容外，文本都会拆分为char而不是单词.有人知道为什么它不能正常工作吗?这按预期工作...

Except no matter what I pass to the tag_sents method, the text gets split up into chars instead of words. Anyone know why it doesn't work properly? This works as expected...

tag(text)

我也尝试将句子拆分为记号，以查看是否有帮助但同样的处理方法

I tried splitting the sentence into tokens as well to see if that helped but same treatment

推荐答案

tag_sents函数获取字符串列表的列表.

The tag_sents function takes a list of list of strings.

tagger.tag_sents(word_tokenize("The quick brown fox jumps over the lazy dog."))

这是一个有用的成语:

 tagger.tag_sents(word_tokenize(sent) for sent in sent_tokenize(text))

其中text是字符串.

这篇关于Stanford NLP Tagger通过NLTK-tag_sents将所有内容拆分为字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Stanford NLP Tagger通过NLTK-tag_sents将所有内容拆分为字符 [英] Stanford NLP Tagger via NLTK - tag_sents splits everything into chars

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Stanford NLP Tagger通过NLTK-tag_sents将所有内容拆分为字符 [英] Stanford NLP Tagger via NLTK - tag_sents splits everything into chars

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭