确定Python句子中2个单词之间的接近度 [英] Determining proximity between 2 words in a sentence in Python
问题描述
我需要确定Python中一个句子中2个单词之间的接近度.例如,在下面的句子中:
I need to determine proximity between 2 words in a sentence in Python. For example, in the following sentence :
the foo and the bar is foo bar
我想确定单词foo
和bar
之间的距离(确定出现在foo
和bar
之间的单词数).
I want to determine the distance between the words foo
and bar
(determine number of words occurring between foo
and bar
).
请注意,以上句子中出现的多个词foo
和bar
会产生不同的距离组合.
Please note that there are multiple occurrences of the words foo
and bar
in the above sentence producing different distance combinations.
此外,单词的顺序也无关紧要.确定这些词之间距离的最佳方法是什么?
Also, order of the words shouldn't matter. What is the best way to determine the distance between these words?
这是我正在使用的代码:
Here is the code I am using :
sentence = "the foo and the bar is foo bar"
first_word_to_look = 'foo'
second_word_to_look = 'bar'
first_word = 0
second_word = 0
dist = 0
if first_word_to_look in sentence and second_word_to_look in sentence:
first_word = len(sentence.split(first_word_to_look)[0].split())
second_word = len(sentence.split(second_word_to_look)[0].split())
if first_word < second_word:
dist = second_word-first_word
else:
dist = first_word-second_word
print dist # distance
上述代码的问题在于,它仅考虑两个单词的首次出现.如果同一句子中出现的次数甚至多于第一个句子,则不会考虑.
The problem with the above code is that it only considers first occurrences of both the words. If there are more occurrences in same sentence that are even closer than first one, it doesn't consider it.
确定距离的最佳方法是什么? python中是否有任何库可以做得更好?
What is the best way to determine the proximity ? Is there any library in python that can do this job better ?
推荐答案
您可以将句子拆分为单词列表,并使用list
的index
方法:
You can split your sentence to list of words and use index
method of list
:
sentence = "the foo and the bar is foo bar"
words = sentence.split()
def get_distance(w1, w2):
if w1 in words and w2 in words:
return abs(words.index(w2) - words.index(w1))
更新以统计所有单词出现的次数:
Update to count all word occurrences:
import itertools
def get_distance(w1, w2):
if w1 in words and w2 in words:
w1_indexes = [index for index, value in enumerate(words) if value == w1]
w2_indexes = [index for index, value in enumerate(words) if value == w2]
distances = [abs(item[0] - item[1]) for item in itertools.product(w1_indexes, w2_indexes)]
return {'min': min(distances), 'avg': sum(distances)/float(len(distances))}
这篇关于确定Python句子中2个单词之间的接近度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!