NLP-查找相似/拼音单词并计算段落中的分数 [英] NLP - Find Similar/Phonetic word and calculate score in a paragraph

查看:295
本文介绍了NLP-查找相似/拼音单词并计算段落中的分数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个简单的NLP项目,我们在其中给出了一组单词,并从文本中查找相似/在语音上相似的单词.我发现了很多算法,但没有示例应用程序.

I'm developing a simple NLP project, where we have given a set of words and to find the similar/phonetically similar word from a text. I've found a lot of algorithms but not a sample application.

还应该通过比较找到的关键字和单词来给出相似度得分.

Also it should give the similarity score by comparing the keyword and the word that are found.

有人可以帮我吗?

    def word2vec(word):
    from collections import Counter
    from math import sqrt

    cw = Counter(word)
    sw = set(cw)
    lw = sqrt(sum(c*c for c in cw.values()))
    return cw, sw, lw

def cosdis(v1, v2):
    common = v1[1].intersection(v2[1])
    return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]

list_A = ['e-commerce', 'ecomme', 'e-commercy', 'ecomacy', 'E-Commerce']
list_B = ['E-Commerce']

for word in list_A:
    for key in list_B:
            res = cosdis(word2vec(word), word2vec(key))
            print(res)

此代码仅进行词与词的比较.

This code only does word to word comparison.

有人可以帮我吗?

推荐答案

我认为您所指的是 API 之类的东西,可以先将单词转换为 IPA 符号(一种语音符号形式),然后比较 IPA 符号.

I think you are referring to something like an API that could first convert word into IPA symbols (a form of phonetic notation) and you then compare the IPA symbols.

from collections import Counter
from math import sqrt
import eng_to_ipa as ipa

def word2vec(word):
    cw = Counter(word)
    sw = set(cw)
    lw = sqrt(sum(c*c for c in cw.values()))
    return cw, sw, lw

def cosdis(v1, v2):
    common = v1[1].intersection(v2[1])
    return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]

list_A = ['e-commerce', 'ecomme', 'e-commercy', 'ecomacy', 'E-Commerce']
list_B = ['E-Commerce']

IPA_list_a = []
IPA_list_b = []
for each in list_A:
    IPA_list_a.append(ipa.convert(each))
for each in list_B:
    IPA_list_b.append(ipa.convert(each))

for word in IPA_list_a:
    for key in IPA_list_b:
            res = cosdis(word2vec(word), word2vec(key))
            print(res)

查看此内容:[ https://github.com /mphilli/English-to-IPA] [1]

Check this out : [https://github.com/mphilli/English-to-IPA][1]

>>> import eng_to_ipa as ipa
>>> ipa.convert("The quick brown fox jumped over the lazy dog.")
'ðə kwɪk braʊn fɑks ʤəmpt ˈoʊvər ðə ˈleɪzi dɔg.'

示例是从上述github链接建立的.

这篇关于NLP-查找相似/拼音单词并计算段落中的分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆