在JAVA中使用哪个NLP工具包? [英] Which NLP toolkit to use in JAVA?
问题描述
我建议您使用POS标记和字符串标记化的组合,以从每个摘要中提取所有名词.然后使用某种字典/哈希来计算每个名词的出现频率,然后输出N个最多产的名词.将其与其他一些智能过滤机制结合起来可以很好地为您提供摘要中的重要关键字.
要进行POS标记,请在 http://nlp.stanford.edu/software/index上查看POS标记器. shtml
但是,如果您期望在语料库中有很多多词术语,而不是仅提取名词,您可以采用最多产的 解决方案
i would recommend you use a combination of POS tagging and then string tokenizing to extract all the nouns out of each abstract.. then use some sort of dictionary/hash to count the frequency of each of these nouns and then outputting the N most prolific nouns.. combining that with some other intelligent filtering mechanisms should do reasonably well in giving you the important keywords from the abstract
for POS tagging check out the POS tagger at http://nlp.stanford.edu/software/index.shtml
However, if you are expecting a lot of multi-word terms in your corpus.. instead of extracting just nouns, you could take the most prolific n-grams for n=2 to 4
这篇关于在JAVA中使用哪个NLP工具包?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!