nltk语言模型(ngram)根据上下文计算单词的概率 [英] nltk language model (ngram) calculate the prob of a word from context

查看:219
本文介绍了nltk语言模型(ngram)根据上下文计算单词的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python和NLTK构建语言模型,如下所示:

I am using Python and NLTK to build a language model as follows:

from nltk.corpus import brown
from nltk.probability import LidstoneProbDist, WittenBellProbDist
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
lm = NgramModel(3, brown.words(categories='news'), estimator)
# Thanks to miku, I fixed this problem
print lm.prob("word", ["This is a context which generates a word"])
>> 0.00493261081006
# But I got another program like this one...
print lm.prob("b", ["This is a context which generates a word"]) 

但是它似乎不起作用.结果如下:

But it doesn't seem to work. The result is as follows:

>>> print lm.prob("word", "This is a context which generates a word")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 79, in prob
    return self._alpha(context) * self._backoff.prob(word, context[1:])
  File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 79, in prob
    return self._alpha(context) * self._backoff.prob(word, context[1:])
  File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 82, in prob
    "context %s" % (word, ' '.join(context)))
TypeError: not all arguments converted during string formatting

有人可以帮我吗?谢谢!

Can anyone help me out? Thanks!

推荐答案

快速修复:

print lm.prob("word", ["This is a context which generates a word"])
# => 0.00493261081006

这篇关于nltk语言模型(ngram)根据上下文计算单词的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆