nltk 语言模型 (ngram) 从上下文计算单词的概率 [英] nltk language model (ngram) calculate the prob of a word from context
本文介绍了nltk 语言模型 (ngram) 从上下文计算单词的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我使用 Python 和 NLTK 构建语言模型如下:
from nltk.corpus import brown从 nltk.probability 导入 LidstoneProbDist,WittenBellProbDistestimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)lm = NgramModel(3, brown.words(categories='news'), estimator)# 感谢 miku,我解决了这个问题print lm.prob("word", ["这是一个生成单词的上下文"])>>0.00493261081006# 但是我有另一个这样的程序......print lm.prob("b", ["这是一个生成单词的上下文"])
但它似乎不起作用.结果如下:
<预><代码>>>>print lm.prob("word", "这是一个生成单词的上下文")回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py",第 79 行,在 prob返回 self._alpha(context) * self._backoff.prob(word, context[1:])文件/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py",第 79 行,在 prob返回 self._alpha(context) * self._backoff.prob(word, context[1:])文件/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py",第 82 行,在 prob"context %s" % (word, ' '.join(context)))类型错误:并非所有参数都在字符串格式化期间转换谁能帮帮我?谢谢!
解决方案
快速修复:
print lm.prob("word", ["这是一个生成单词的上下文"])# =>0.00493261081006
I am using Python and NLTK to build a language model as follows:
from nltk.corpus import brown
from nltk.probability import LidstoneProbDist, WittenBellProbDist
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
lm = NgramModel(3, brown.words(categories='news'), estimator)
# Thanks to miku, I fixed this problem
print lm.prob("word", ["This is a context which generates a word"])
>> 0.00493261081006
# But I got another program like this one...
print lm.prob("b", ["This is a context which generates a word"])
But it doesn't seem to work. The result is as follows:
>>> print lm.prob("word", "This is a context which generates a word")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 79, in prob
return self._alpha(context) * self._backoff.prob(word, context[1:])
File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 79, in prob
return self._alpha(context) * self._backoff.prob(word, context[1:])
File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 82, in prob
"context %s" % (word, ' '.join(context)))
TypeError: not all arguments converted during string formatting
Can anyone help me out? Thanks!
解决方案
Quick fix:
print lm.prob("word", ["This is a context which generates a word"])
# => 0.00493261081006
这篇关于nltk 语言模型 (ngram) 从上下文计算单词的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文