迭代一个synsets的列表 [英] Iterate one list of synsets over another

查看:536
本文介绍了迭代一个synsets的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两套wordnet synsets(包含在两个单独的列表对象s1和s2中),我想从中找出s1中每个synset在s2上的最大路径相似度分数,输出长度等于s1 。例如,如果s1包含4个synsets,那么输出的长度应该是4.

我已经试验了以下代码(到目前为止):



import numpy as npimport nltkfrom nltk.corpus import wordnet as wnimport pandas as pd#two wordnet synsets s1,s2)s1 = [wn.synset('be.v.01'),wn.synset('angstrom.n.01'),wn.synset('trial.n.02'),wn.synset 'function.n.01')] s2 = [wn.synset('use.n.01'),wn.synset('function.n.01'),wn.synset('check.n.01') ,wn.synset('code.n.01'),wn.synset('inch.n.01'),wn.synset('be.v.01'),wn.synset('correct.v.01 ')]#定义一个函数为s1到s2中的每个synset找到最高路径相似度得分,其输出长度等于s1ps_list = [] def similarity_score(s1,s2)的长度:对于s1中的word1: (s1,s2)

最佳= max(wn.path_similarity(word1,word2)for word2 in s2)ps_list.append(best)return ps_listps_list

但是它会返回以下错误消息:

$ p $ '> ;'NoneType'和'float'的实例之间不支持

我无法弄清楚什么是继续与代码。有人会关心我的代码,并分享他/她对for循环的见解吗?

完整的错误追溯在这里

  TypeError Traceback(最近一次调用最后一个)< ipython-input-73-4506121e17dc> in< module>()38 return word_list 39 ---> 40 s = similarity_score(s1,s2)41 42< ipython-input-73-4506121e17dc> (s1,s2)33 def similarity_score(s1,s2):对于s1中的word1,为34:---> 35 best = max(word2,word2)for word2 in s2)36 word_list.append(best)37 TypeError:'>'在'NoneType'和'float'的实例之间不支持 

我想出了这个临时解决方案:


$ b

s_list = word2 in s2] b = pd.Series(best).max()s_list.append(b)



这不是优雅,但它的工作原理。不知道是否有人有更好的解决方案或方便的技巧来处理这个问题?解决方案

我没有经验的nltk模块,但从阅读文档我可以看到,path_similarity是任何对象的方法 wn.synset(args)返回。你应该把它作为一个函数处理。

你应该做的是这样的:



<$ p在s1中word1的
best = max(word2.path_similarity(word2)for word2 in s2)#path_similarity是每个synset
ps_list.append(best)


I have two sets of wordnet synsets (contained in two separate list objects, s1 and s2), from which I want to find the maximum path similarity score for each synset in s1 onto s2 with the length of output equal that of s1. For example, if s1 contains 4 synsets, then the length of output should be 4.

I have experimented with the following code (so far):

import numpy as np
import nltk
from nltk.corpus import wordnet as wn
import pandas as pd

#two wordnet synsets (s1, s2)

s1 = [wn.synset('be.v.01'),
 wn.synset('angstrom.n.01'),
 wn.synset('trial.n.02'),
 wn.synset('function.n.01')]

s2 = [wn.synset('use.n.01'),
 wn.synset('function.n.01'),
 wn.synset('check.n.01'),
 wn.synset('code.n.01'),
 wn.synset('inch.n.01'),
 wn.synset('be.v.01'),
 wn.synset('correct.v.01')]
 
# define a function to find the highest path similarity score for each synset in s1 onto s2, with the length of output equal that of s1

ps_list = []
def similarity_score(s1, s2):
    for word1 in s1:
        best = max(wn.path_similarity(word1, word2) for word2 in s2)
        ps_list.append(best)
    return ps_list

ps_list(s1, s2)

But it returns this following error message

'>' not supported between instances of 'NoneType' and 'float'

I couldn't figure out what's going on with code. Would anyone care to take a look at my code and share his/her insights on the for loop? It will be really appreciated.

Thank you.

The full error traceback is here

TypeError                                 Traceback (most recent call last)
<ipython-input-73-4506121e17dc> in <module>()
     38     return word_list
     39 
---> 40 s = similarity_score(s1, s2)
     41 
     42 

<ipython-input-73-4506121e17dc> in similarity_score(s1, s2)
     33 def similarity_score(s1, s2):
     34     for word1 in s1:
---> 35         best = max(wn.path_similarity(word1, word2) for word2 in s2)
     36         word_list.append(best)
     37 

TypeError: '>' not supported between instances of 'NoneType' and 'float'

[edit] I came up with this temporary solution:

s_list = []
for word1 in s1:
    best = [word1.path_similarity(word2) for word2 in s2]
    b = pd.Series(best).max()
    s_list.append(b)

It's not elegant but it works. Wonder if anyone have better solutions or handy tricks to handle this?

解决方案

I have no experience with the nltk module, but from reading the docs I can see that path_similarity is a method of whatever object wn.synset(args) returns. You are instead treating it as a function.

What you should be doing, is something like this:

ps_list = []
for word1 in s1:
    best = max(word1.path_similarity(word2) for word2 in s2) #path_similarity is a method of each synset
    ps_list.append(best)

这篇关于迭代一个synsets的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆