nlp-如何检测句子中的单词是否指向颜色/身体部位/车辆 [英] nlp - How to detect if a word in a sentence is pointing to a color/body part /vehicle

查看:197
本文介绍了nlp-如何检测句子中的单词是否指向颜色/身体部位/车辆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,正如标题所示,我想知道句子中的某个单词是否指向

So as the title suggests I would like to know if a certain word in a sentence is pointing to

1]一种颜色

The grass is green.

因此绿色"是彩色

2]身体部位

Her hands are soft

因此手"是身体的一部分

Hence "hands" is a body part

3]一辆车

I am driving my car on the causeway

因此,汽车"是车辆

在类似的问题中,解析器是可能的有效解决方案之一. 例如,斯坦福解析器被建议用于类似的问题

In similar problems, parsers are one of the possible effective solutions. Stanford parser for example was suggested to a similar question

如何查找句子中的单词是否指向城市

现在的问题是斯坦福解析器可用于检测:

Now the problem is that stanford parser can be used to detect:

LOCATION
ORGANIZATION
DATE
MONEY
PERSON
PERCENT
TIME

但是,如果您想尝试检测其他内容,则可能会遇到类似问题中提到的单词网.

However if you would like to try to detect something else, word-net might be an option as mentioned in a similar question

如何在句子中列出表示动物的所有英语术语?

使用wordnet并利用下位词/上位词关系建议答案之一.答案中还提到了wordnet的名词动物文件.

One of the answers suggested using wordnet and leveraging the hyponym/hypernym relation. The answer also mentioned the noun.animal file of wordnet.

下面的链接显示了Wordnet中所有其他文件的列表 https://wordnet.princeton.edu/man/lexnames.5WN.html

The link below shows a list of all other files in wordnet https://wordnet.princeton.edu/man/lexnames.5WN.html

我的方法是可以利用

1]

(noun.body FOR body parts)

2]

(noun.artifact FOR vehicles)

3]

 The (hyponym/hypernym) relationship can be used to detect if word is pointing to a color or not.

那将是一种有效的方法吗?

So would that be a valid approach ?

我如何利用wordnet的(hyperonym/hypernym)?

And how can I make use of the (hyponym/hypernym) is wordnet ?

注意:我打算使用:JWI(MIT Java Wordnet接口)

NOTE: I'am planning to use: JWI (the MIT Java Wordnet Interface)

推荐答案

参考 hyponymy/hypernymy 方法,这将涉及探索词网树及其与词之间的关系.

Referring to the hyponymy / hypernymy approach, this would involve exploring the wordnet tree and its relations between words.

(更准确地说是 Synset )单词的同义词表示本质上更特殊的概念,而 hypernyms 代表本质上更笼统的概念.与Wordnet的树状结构类似,您可以将下位字母视为您正在查看的单词( node )的 children ,而上位字母为这个词的父母.

The hyponyms of a word (of a Synset, to be more accurate) represent concepts which are more particular in nature, while hypernyms represent concepts more general in nature. As an analogy with the tree-like structure of Wordnet, you could view the hyponyms as children of the word (node) you are looking at, with hypernyms being parents of that word.

例如,以 dog 一词的下位词和上位词为例:

As an example, taking the hyponyms and the hypernyms of the word dog:

dog = wn.synsets('dog')[0]
print(dog.hypernyms())
print(dog.hyponyms())

产生以下结果:

[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

[Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), 
Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), S 
Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), 
Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), 
Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), 
Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), 
Synset('toy_dog.n.01'), Synset('working_dog.n.01')]

以类似的方式,例如,如果我们想知道哪个单词代表颜色,我们可以探索不同单词代表颜色的上位词,希望它们具有相同的祖先(hypernym).从这个意义上讲,我做了以下实验:

In a similar manner, if we wanted for example to know which words represent colours, we could explore the hypernyms of different words representing colours, hoping that they would have a common ancestor (hypernym). In this sense, I have done the following experiments:

print(wn.synsets('green')[0].hypernyms())
print(wn.synsets('blue')[0].hypernyms())
print(wn.synsets('red')[0].hypernyms())
print(wn.synsets('yellow')[0].hypernyms())

所有共享相同的上位词列表:

all of which share the same hypernym list:

[Synset('chromatic_color.n.01')]

print(wn.synsets('black')[0].hypernyms())
print(wn.synsets('gray')[0].hypernyms())

产生结果

[Synset('achromatic_color.n.01')]

接下来我们可以做的是打印这些合成同义词的所有下标:

Next thing we can do is print all the hyponyms of these resulting synsets:

print(wn.synset('chromatic_color.n.01').hyponyms())
print(wn.synset('chromatic_color.n.01').hyponyms())

给出结果

[Synset('blond.n.02'), Synset('blue.n.01'), Synset('brown.n.01'), 
Synset('complementary_color.n.01'), Synset('green.n.01'), 
Synset('olive.n.05'), Synset('orange.n.02'), Synset('pastel.n.01'), 
Synset('pink.n.01'), Synset('purple.n.01'), Synset('red.n.01'), 
Synset('salmon.n.04'), Synset('yellow.n.01')]

[Synset('black.n.01'), Synset('gray.n.01'), Synset('white.n.02')]

可以使用相同的技术来探索与身体部位车辆有关的选项.

The same technique could be applied to explore options relating to body parts or vehicles.

另外,对于衍生词(例如 redish ),我知道有两种绕过它们缺失的方法:

Also, in the case of derivative words such as reddish, there are two methods of bypassing their absence that I know of:

    通过 Porter Stemmer
  • 加引号标记化文本(请参见此链接)
  • 使用 Morphy 获取基本形式,让您在Wordnet中查找生成的单词(请参见此链接以获取有关Morphy的详细信息).我建议使用此方法,因为词干可能会产生Wordnet中不存在的单词
  • Stemming the tokenized text, by means of Porter Stemmer (see this link)
  • Using Morphy to get the base forms, letting you look up the resulting words in Wordnet (see this link for details on Morphy). I would recommend this method, since stemming could potentially yield words which do not exist in Wordnet

这篇关于nlp-如何检测句子中的单词是否指向颜色/身体部位/车辆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆