NLP:找到单词之间语义相似性的简便方法好吗? [英] NLP: any easy and good methods to find semantic similarity between words?

查看:60
本文介绍了NLP:找到单词之间语义相似性的简便方法好吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道StackOverflow是否涵盖NLP,所以我来试试. 我有兴趣从特定领域中找到两个词的语义相关性,即图像质量"和噪声".我正在做一些研究,以确定相机的评论对于相机的特定属性是正面还是负面. (例如每条评论中的图片质量).

I don't know whether StackOverflow covers NLP, so I am gonna give this a shot. I am interested to find the semantic relatedness of two words from a specific domain, i.e. "image quality" and "noise". I am doing some research to determine if reviews of cameras are positive or negative for a particular attribute of the camera. (like image quality in each one of the reviews).

但是,并不是每个人都在帖子中使用完全相同的措辞图像质量",所以我出去看看我是否有办法建立这样的东西:

However, not everybody uses the exact same wording "image quality" in the posts, so I am out to see if there is a way for me to build something like that:

图像质量",包括(噪声",颜色",清晰度"等) 所以我可以将所有东西都包裹在一把大雨伞中.

"image quality" which includes ("noise", "color", "sharpness", etc etc) so I can wrap all everything within one big umbrella.

我正在为另一种语言执行此操作,因此Wordnet不一定有帮助.不,我不为Google或Microsoft工作,因此我也没有从人们点击行为中获得的数据作为输入数据.

I am doing this for another language, so Wordnet is not necessarily helpful. And no, I do not work for Google or Microsoft so I do not have data from people's clicking behaviour as input data either.

但是,我确实有很多文字,带有pos标签,分段等.

However, I do have a lot of text, pos-tagged, segmented etc.

推荐答案

为了找到单词之间的语义相似性,请随机索引,该方法已在NLP中广泛使用

In order to find semantic similarity between words, a word space model should do the trick. Such a model can be implemented very easily and fairly efficiently. Most likely, you will want to implement some sort of dimensionality reduction. The easiest one I can think of is Random Indexing, which has been used extensively in NLP.

有了单词空间模型后,您就可以计算单词之间的距离(例如,余弦距离).在这样的模型中,您应该得到前面提到的结果(焦点"和细节"之间的距离应大于相机重量"与闪光灯" ).

Once you have your word space model, you can calculate distances (e.g. cosine distance) between words. In such a model, you should get the results you mentioned earlier (distance between "focus" and "Details" should be higher than "camera weight" vs "flash").

希望这会有所帮助!

这篇关于NLP:找到单词之间语义相似性的简便方法好吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆