python-计算列表单词之间的正字相似度 [英] python - calculate orthographic similarity between words of a list

查看：230 发布时间：2020/5/18 22:51:33 python arrays numpy itertools levenshtein-distance

本文介绍了python-计算列表单词之间的正字相似度的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要计算给定语料库中单词之间的拼写相似度(编辑/Levenshtein距离).

I need to calculate orthographic similarity (edit/Levenshtein distance) among words in a given corpus.

正如基里尔在下面建议的那样，我尝试执行以下操作:

As Kirill suggested below, I tried to do the following:

import csv, itertools, Levenshtein
import numpy as np

# import the list of words from csv file
path = '/Users/my path'
file = path + 'file.csv'

with open(file, 'rb') as f:
    reader = csv.reader(f)
    wordlist = list(reader)

wordlist = np.array(wordlist) #make it a np array
wordlist2 = wordlist[:,0] #subset the first column of the imported list

for a, b in itertools.product(wordlist, wordlist):
    if a < b:
        print(a, b, Levenshtein.distance(a, b))

但是，出现以下错误:

ValueError:具有多个元素的数组的真值不明确.使用a.any()或a.all()

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我了解代码中的歧义，但是有人可以帮助我找出解决方法吗?谢谢！

I understand the ambiguity in the code, but can someone help me figure out how to solve this? Thanks!

推荐答案

感谢基里尔的帮助，这是我想出的代码.

Here's the code I came up with thank to the help of Kirill.

import csv#, StringIO
import itertools, Levenshtein

# open the newline-separated list of words
path = '/Users/your path'
file = path + 'wordlists.txt'
output = path + 'ortho_similarities.txt'
words = sorted(set(s.strip() for s in open(file)))

# the following loop take all possible pairwise combinations
# of the words in the list words, and calculate the LD
# and then let's write everything in a csv file
with open(output, 'wb') as f:
   writer = csv.writer(f, delimter=",", lineterminator="\n")
   for a, b in itertools.product(words, words):
      if a < b:
        write.writerow([a, b, Levenshtein.distance(a,b)])

这篇关于python-计算列表单词之间的正字相似度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python-计算列表单词之间的正字相似度 [英] python - calculate orthographic similarity between words of a list

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python-计算列表单词之间的正字相似度 [英] python - calculate orthographic similarity between words of a list

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭