nltk 语言模型 (ngram) 从上下文计算单词的概率 [英] nltk language model (ngram) calculate the prob of a word from context

查看:29
本文介绍了nltk 语言模型 (ngram) 从上下文计算单词的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 Python 和 NLTK 构建语言模型如下:

from nltk.corpus import brown从 nltk.probability 导入 LidstoneProbDist,WittenBellProbDistestimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)lm = NgramModel(3, brown.words(categories='news'), estimator)# 感谢 miku,我解决了这个问题print lm.prob("word", ["这是一个生成单词的上下文"])>>0.00493261081006# 但是我有另一个这样的程序......print lm.prob("b", ["这是一个生成单词的上下文"])

但它似乎不起作用.结果如下:

<预><代码>>>>print lm.prob("word", "这是一个生成单词的上下文")回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py",第 79 行,在 prob返回 self._alpha(context) * self._backoff.prob(word, context[1:])文件/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py",第 79 行,在 prob返回 self._alpha(context) * self._backoff.prob(word, context[1:])文件/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py",第 82 行,在 prob"context %s" % (word, ' '.join(context)))类型错误:并非所有参数都在字符串格式化期间转换

谁能帮帮我?谢谢!

解决方案

快速修复:

print lm.prob("word", ["这是一个生成单词的上下文"])# =>0.00493261081006

I am using Python and NLTK to build a language model as follows:

from nltk.corpus import brown
from nltk.probability import LidstoneProbDist, WittenBellProbDist
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
lm = NgramModel(3, brown.words(categories='news'), estimator)
# Thanks to miku, I fixed this problem
print lm.prob("word", ["This is a context which generates a word"])
>> 0.00493261081006
# But I got another program like this one...
print lm.prob("b", ["This is a context which generates a word"]) 

But it doesn't seem to work. The result is as follows:

>>> print lm.prob("word", "This is a context which generates a word")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 79, in prob
    return self._alpha(context) * self._backoff.prob(word, context[1:])
  File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 79, in prob
    return self._alpha(context) * self._backoff.prob(word, context[1:])
  File "/usr/local/lib/python2.6/dist-packages/nltk/model/ngram.py", line 82, in prob
    "context %s" % (word, ' '.join(context)))
TypeError: not all arguments converted during string formatting

Can anyone help me out? Thanks!

解决方案

Quick fix:

print lm.prob("word", ["This is a context which generates a word"])
# => 0.00493261081006

这篇关于nltk 语言模型 (ngram) 从上下文计算单词的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆