用NLTK生成二元组 [英] Generate bigrams with NLTK

查看：158 发布时间：2020/5/18 1:17:18 python nltk n-gram

本文介绍了用NLTK生成二元组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

例如，如果我键入，我试图生成给定句子的双字母组列表，

I am trying to produce a bigram list of a given sentence for example, if I type,

    To be or not to be

我希望程序生成

     to be, be or, or not, not to, to be

我尝试了以下代码，但只给了我

I tried the following code but just gives me

<generator object bigrams at 0x0000000009231360>

这是我的代码:

    import nltk
    bigrm = nltk.bigrams(text)
    print(bigrm)

那么我如何得到想要的东西?我想要一列上述单词的组合列表(是，不是，不是).

So how do I get what I want? I want a list of combinations of the words like above (to be, be or, or not, not to, to be).

推荐答案

nltk.bigrams() 返回bigrams的迭代器(特别是生成器).如果需要列表，请将迭代器传递给list().它还需要一系列的项目来生成双字母组，因此您必须在传递文本之前先将其拆分(如果您没有这样做的话):

nltk.bigrams() returns an iterator (a generator specifically) of bigrams. If you want a list, pass the iterator to list(). It also expects a sequence of items to generate bigrams from, so you have to split the text before passing it (if you had not done it):

bigrm = list(nltk.bigrams(text.split()))

要用逗号分隔打印出来，可以(在python 3中):

To print them out separated with commas, you could (in python 3):

print(*map(' '.join, bigrm), sep=', ')

如果在python 2上，则例如:

If on python 2, then for example:

print ', '.join(' '.join((a, b)) for a, b in bigrm)

请注意，仅出于打印目的，您无需生成列表，只需使用迭代器即可.

Note that just for printing you do not need to generate a list, just use the iterator.

这篇关于用NLTK生成二元组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用NLTK生成二元组 [英] Generate bigrams with NLTK

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

用NLTK生成二元组 [英] Generate bigrams with NLTK

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭