使用Python NLTK的Trigram的Kneser-Ney平滑 [英] Kneser-Ney smoothing of trigrams using Python NLTK

查看：792 发布时间：2020/5/18 0:52:18 python nlp nltk smoothing

本文介绍了使用Python NLTK的Trigram的Kneser-Ney平滑的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Python NLTK通过Kneser-Ney平滑来平滑一组n-gram概率. 不幸的是，整个文档很少.

I'm trying to smooth a set of n-gram probabilities with Kneser-Ney smoothing using the Python NLTK. Unfortunately, the whole documentation is rather sparse.

我要这样做的是:我将文本解析为三元组元组的列表.从此列表中，我创建一个FreqDist，然后使用该FreqDist计算KN平滑分布.

What I'm trying to do is this: I parse a text into a list of tri-gram tuples. From this list I create a FreqDist and then use that FreqDist to calculate a KN-smoothed distribution.

我很确定，结果是完全错误的.当我总结各个概率时，我得到的东西超出了1.以下面的代码示例为例:

I'm pretty sure though, that the result is totally wrong. When I sum up the individual probabilities I get something way beyond 1. Take this code example:

import nltk

ngrams = nltk.trigrams("What a piece of work is man! how noble in reason! how infinite in faculty! in \
form and moving how express and admirable! in action how like an angel! in apprehension how like a god! \
the beauty of the world, the paragon of animals!")

freq_dist = nltk.FreqDist(ngrams)
kneser_ney = nltk.KneserNeyProbDist(freq_dist)
prob_sum = 0
for i in kneser_ney.samples():
    prob_sum += kneser_ney.prob(i)
print(prob_sum)

输出为"41.51696428571428".根据语料库大小，此值会无限增大.这使得除了prob()以外，其他任何东西都不会返回概率分布.

The output is "41.51696428571428". Depending on the corpus size, this value grows infinitely large. That makes whatever prob() returns anything but a probability distribution.

看一下NLTK代码，我会说实现是有问题的.也许我只是不了解应该如何使用该代码.在这种情况下，您能给我个提示吗?在任何其他情况下:您知道任何有效的Python实现吗?我真的不想自己实现它.

Looking at the NLTK code I would say that the implementation is questionable. Maybe I just don't understand how the code is supposed to be used. In that case, could you give me a hint please? In any other case: do you know any working Python implementation? I don't really want to implement it myself.

使用Python NLTK的Trigram的Kneser-Ney平滑 [英] Kneser-Ney smoothing of trigrams using Python NLTK

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python NLTK的Trigram的Kneser-Ney平滑 [英] Kneser-Ney smoothing of trigrams using Python NLTK

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭