用NLTK生成二元组 [英] Generate bigrams with NLTK
问题描述
例如,如果我键入,我试图生成给定句子的双字母组列表,
I am trying to produce a bigram list of a given sentence for example, if I type,
To be or not to be
我希望程序生成
to be, be or, or not, not to, to be
我尝试了以下代码,但只给了我
I tried the following code but just gives me
<generator object bigrams at 0x0000000009231360>
这是我的代码:
import nltk
bigrm = nltk.bigrams(text)
print(bigrm)
那么我如何得到想要的东西?我想要一列上述单词的组合列表(是,不是,不是).
So how do I get what I want? I want a list of combinations of the words like above (to be, be or, or not, not to, to be).
推荐答案
nltk.bigrams()
返回bigrams的迭代器(特别是生成器).如果需要列表,请将迭代器传递给list()
.它还需要一系列的项目来生成双字母组,因此您必须在传递文本之前先将其拆分(如果您没有这样做的话):
nltk.bigrams()
returns an iterator (a generator specifically) of bigrams. If you want a list, pass the iterator to list()
. It also expects a sequence of items to generate bigrams from, so you have to split the text before passing it (if you had not done it):
bigrm = list(nltk.bigrams(text.split()))
要用逗号分隔打印出来,可以(在python 3中):
To print them out separated with commas, you could (in python 3):
print(*map(' '.join, bigrm), sep=', ')
如果在python 2上,则例如:
If on python 2, then for example:
print ', '.join(' '.join((a, b)) for a, b in bigrm)
请注意,仅出于打印目的,您无需生成列表,只需使用迭代器即可.
Note that just for printing you do not need to generate a list, just use the iterator.
这篇关于用NLTK生成二元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!