使用Python在句子列表中形成单词的二元组 [英] Forming Bigrams of words in list of sentences with Python

查看：460 发布时间：2020/5/2 5:49:19 python list list-comprehension nltk collocation

本文介绍了使用Python在句子列表中形成单词的二元组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个句子列表:

text = ['cant railway station','citadel hotel',' police stn'].

我需要形成双字母对，并将它们存储在变量中.问题是当我这样做时，我得到一对句子而不是单词.这是我所做的:

I need to form bigram pairs and store them in a variable. The problem is that when I do that, I get a pair of sentences instead of words. Here is what I did:

text2 = [[word for word in line.split()] for line in text]
bigrams = nltk.bigrams(text2)
print(bigrams)

产生

[(['cant', 'railway', 'station'], ['citadel', 'hotel']), (['citadel', 'hotel'], ['police', 'stn'])

火车站和城堡酒店不能合二为一.我想要的是

Can't railway station and citadel hotel form one bigram. What I want is

[([cant],[railway]),([railway],[station]),([citadel,hotel]), and so on...

第一个句子的最后一个单词不应与第二个句子的第一个单词合并. 我应该怎么做才能使其正常工作?

The last word of the first sentence should not merge with the first word of second sentence. What should I do to make it work?

推荐答案

使用列表理解和 zip :

>>> text = ["this is a sentence", "so is this one"]
>>> bigrams = [b for l in text for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
>>> print(bigrams)
[('this', 'is'), ('is', 'a'), ('a', 'sentence'), ('so', 'is'), ('is', 'this'), ('this',     
'one')]

这篇关于使用Python在句子列表中形成单词的二元组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python在句子列表中形成单词的二元组 [英] Forming Bigrams of words in list of sentences with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python在句子列表中形成单词的二元组 [英] Forming Bigrams of words in list of sentences with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭