NLTK:语料库级别的 bleu 与句子级别的 BLEU 分数 [英] NLTK: corpus-level bleu vs sentence-level BLEU score
问题描述
我在 python 中导入了 nltk 来计算 Ubuntu 上的 BLEU 分数.我了解句子级 BLEU 分数的工作原理,但我不了解语料库级 BLEU 分数的工作原理.
以下是我的语料库级 BLEU 分数代码:
import nltk假设 = ['这个','是','猫']参考 = ['This', 'is', 'a', 'cat']BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])打印(BLEUscore)
出于某种原因,上述代码的 bleu 分数为 0.我期待语料库级别的 BLEU 分数至少为 0.5.
这是我的句子级 BLEU 分数代码
import nltk假设 = ['这个','是','猫']参考 = ['This', 'is', 'a', 'cat']BLEUscore = nltk.translate.bleu_score.sentence_bleu([参考],假设,权重 = [1])打印(BLEUscore)
这里的句子级 BLEU 分数是我期望的 0.71,考虑到简洁惩罚和缺失的单词a".但是,我不明白语料库级别的 BLEU 分数是如何工作的.
任何帮助将不胜感激.
TL;DR:
<预><代码>>>>导入 nltk>>>假设 = ['这个','是','猫']>>>参考 = ['This', 'is', 'a', 'cat']>>>references = [reference] # 1 个句子的参考文献列表.>>>list_of_references = [references] # 语料库中所有句子的参考列表.>>>list_of_hypotheses = [hypothesis] # 与参考文献列表相对应的假设列表.>>>nltk.translate.bleu_score.corpus_bleu(list_of_references,list_of_hypotheses)0.6025286104785453>>>nltk.translate.bleu_score.sentence_bleu(参考文献,假设)0.6025286104785453(注意:您必须在 develop
分支上拉取最新版本的 NLTK 以获得稳定版本的 BLEU 分数实现)
长篇:
实际上,如果整个语料库中只有一个参考和一个假设,corpus_bleu()
和 sentence_bleu()
都应该返回与示例中所示相同的值
在代码中,我们看到sentence_bleu
实际上是corpus_bleu
的鸭子类型:
def sentence_bleu(参考文献,假设,权重=(0.25, 0.25, 0.25, 0.25),平滑函数=无):return corpus_bleu([references], [hypothesis], weights, smoothing_function)
如果我们查看sentence_bleu
的参数:
def sentence_bleu(references,假设, weights=(0.25, 0.25, 0.25, 0.25),平滑函数=无):"""":param 参考:参考句子:type 引用: list(list(str)):param假设:假设语句:type 假设: list(str):param weights: unigrams、bigrams、trigrams 等的权重:类型权重:列表(浮动):return: 句子级别的 BLEU 分数.:rtype: 浮动"""
sentence_bleu
引用的输入是一个 list(list(str))
.
因此,如果您有一个句子字符串,例如"This is a cat"
,你必须对它进行标记以获得字符串列表,["This", "is", "a", "cat"]
并且由于它允许多个引用,因此它必须是字符串列表的列表,例如如果您有第二个参考,这是一只猫",您对 sentence_bleu()
的输入将是:
references = [ [This", is", a", cat"], [This", is", a", feline"] ]假设 = [这个",是",猫"]sentence_bleu(参考文献,假设)
说到corpus_bleu()
list_of_references 参数,基本上是sentence_bleu()
作为引用的任何内容的列表:
def corpus_bleu(list_of_references,hypothesis, weights=(0.25, 0.25, 0.25, 0.25),平滑函数=无):""":param 参考:参考句子列表的语料库,w.r.t.假设:type 引用: list(list(list(str))):param假设:假设句子列表:type 假设: list(list(str)):param weights: unigrams、bigrams、trigrams 等的权重:类型权重:列表(浮动):return: 语料库级别的 BLEU 分数.:rtype: 浮动"""
除了查看 nltk 中的 doctest/translate/bleu_score.py
,你也可以看看nltk/test/unit/translate/test_bleu_score.py
查看如何使用中的每个组件bleu_score.py
.
顺便说一下,由于 sentence_bleu
在 (nltk.translate.__init__.py
](https://github.com/nltk/nltk/blob/develop/nltk/translate/init.py#L21),使用
from nltk.translate import bleu
将是相同的:
from nltk.translate.bleu_score import sentence_bleu
并在代码中:
<预><代码>>>>从 nltk.translate 导入 bleu>>>从 nltk.translate.bleu_score 导入句子_bleu>>>从 nltk.translate.bleu_score 导入 corpus_bleu>>>蓝==sentence_bleu真的>>>bleu == corpus_bleu错误的I have imported nltk in python to calculate BLEU Score on Ubuntu. I understand how sentence-level BLEU score works, but I don't understand how corpus-level BLEU score work.
Below is my code for corpus-level BLEU score:
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])
print(BLEUscore)
For some reason, the bleu score is 0 for the above code. I was expecting a corpus-level BLEU score of at least 0.5.
Here is my code for sentence-level BLEU score
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = [1])
print(BLEUscore)
Here the sentence-level BLEU score is 0.71 which I expect, taking into account the brevity-penalty and the missing word "a". However, I don't understand how corpus-level BLEU score work.
Any help would be appreciated.
TL;DR:
>>> import nltk
>>> hypothesis = ['This', 'is', 'cat']
>>> reference = ['This', 'is', 'a', 'cat']
>>> references = [reference] # list of references for 1 sentence.
>>> list_of_references = [references] # list of references for all sentences in corpus.
>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.
>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
0.6025286104785453
>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
0.6025286104785453
(Note: You have to pull the latest version of NLTK on the develop
branch in order to get a stable version of the BLEU score implementation)
In Long:
Actually, if there's only one reference and one hypothesis in your whole corpus, both corpus_bleu()
and sentence_bleu()
should return the same value as shown in the example above.
In the code, we see that sentence_bleu
is actually a duck-type of corpus_bleu
:
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
return corpus_bleu([references], [hypothesis], weights, smoothing_function)
And if we look at the parameters for sentence_bleu
:
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
""""
:param references: reference sentences
:type references: list(list(str))
:param hypothesis: a hypothesis sentence
:type hypothesis: list(str)
:param weights: weights for unigrams, bigrams, trigrams and so on
:type weights: list(float)
:return: The sentence-level BLEU score.
:rtype: float
"""
The input for sentence_bleu
's references is a list(list(str))
.
So if you have a sentence string, e.g. "This is a cat"
, you have to tokenized it to get a list of strings, ["This", "is", "a", "cat"]
and since it allows for multiple references, it has to be a list of list of string, e.g. if you have a second reference, "This is a feline", your input to sentence_bleu()
would be:
references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]
hypothesis = ["This", "is", "cat"]
sentence_bleu(references, hypothesis)
When it comes to corpus_bleu()
list_of_references parameter, it's basically a list of whatever the sentence_bleu()
takes as references:
def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
"""
:param references: a corpus of lists of reference sentences, w.r.t. hypotheses
:type references: list(list(list(str)))
:param hypotheses: a list of hypothesis sentences
:type hypotheses: list(list(str))
:param weights: weights for unigrams, bigrams, trigrams and so on
:type weights: list(float)
:return: The corpus-level BLEU score.
:rtype: float
"""
Other than look at the doctest within the nltk/translate/bleu_score.py
, you can also take a look at the unittest at nltk/test/unit/translate/test_bleu_score.py
to see how to use each of the component within the bleu_score.py
.
By the way, since the sentence_bleu
is imported as bleu
in the (nltk.translate.__init__.py
](https://github.com/nltk/nltk/blob/develop/nltk/translate/init.py#L21), using
from nltk.translate import bleu
would be the same as:
from nltk.translate.bleu_score import sentence_bleu
and in code:
>>> from nltk.translate import bleu
>>> from nltk.translate.bleu_score import sentence_bleu
>>> from nltk.translate.bleu_score import corpus_bleu
>>> bleu == sentence_bleu
True
>>> bleu == corpus_bleu
False
这篇关于NLTK:语料库级别的 bleu 与句子级别的 BLEU 分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!