确定Python句子中2个单词之间的接近度 [英] Determining proximity between 2 words in a sentence in Python

查看:223
本文介绍了确定Python句子中2个单词之间的接近度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要确定Python中一个句子中2​​个单词之间的接近度.例如,在下面的句子中:

I need to determine proximity between 2 words in a sentence in Python. For example, in the following sentence :

the foo and the bar is foo bar

我想确定单词foobar之间的距离(确定出现在foobar之间的单词数).

I want to determine the distance between the words foo and bar (determine number of words occurring between foo and bar).

请注意,以上句子中出现的多个词foobar会产生不同的距离组合.

Please note that there are multiple occurrences of the words foo and bar in the above sentence producing different distance combinations.

此外,单词的顺序也无关紧要.确定这些词之间距离的最佳方法是什么?

Also, order of the words shouldn't matter. What is the best way to determine the distance between these words?

这是我正在使用的代码:

Here is the code I am using :

sentence = "the foo and the bar is foo bar"

first_word_to_look = 'foo'
second_word_to_look = 'bar'

first_word = 0
second_word = 0
dist = 0

if first_word_to_look in sentence and second_word_to_look in sentence:

    first_word = len(sentence.split(first_word_to_look)[0].split())
    second_word = len(sentence.split(second_word_to_look)[0].split())

    if first_word < second_word:
        dist = second_word-first_word
    else:
        dist = first_word-second_word

print dist  # distance

上述代码的问题在于,它仅考虑两个单词的首次出现.如果同一句子中出现的次数甚至多于第一个句子,则不会考虑.

The problem with the above code is that it only considers first occurrences of both the words. If there are more occurrences in same sentence that are even closer than first one, it doesn't consider it.

确定距离的最佳方法是什么? python中是否有任何库可以做得更好?

What is the best way to determine the proximity ? Is there any library in python that can do this job better ?

推荐答案

您可以将句子拆分为单词列表,并使用listindex方法:

You can split your sentence to list of words and use index method of list:

sentence = "the foo and the bar is foo bar"
words = sentence.split()

def get_distance(w1, w2):
     if w1 in words and w2 in words:
          return abs(words.index(w2) - words.index(w1))

更新以统计所有单词出现的次数:

Update to count all word occurrences:

import itertools

def get_distance(w1, w2):
    if w1 in words and w2 in words:
        w1_indexes = [index for index, value in enumerate(words) if value == w1]    
        w2_indexes = [index for index, value in enumerate(words) if value == w2]    
        distances = [abs(item[0] - item[1]) for item in itertools.product(w1_indexes, w2_indexes)]
        return {'min': min(distances), 'avg': sum(distances)/float(len(distances))}

这篇关于确定Python句子中2个单词之间的接近度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆