Wordnet SQL说明 [英] Wordnet SQL Explanation

查看:409
本文介绍了Wordnet SQL说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要得到一个简单的同义词数据库并运行,所以我可以找到用户输入的单词的同义词(没有别的!)。为此,我抓住了Wordnet的副本sql thesarus( http://wnsql.sourceforge.net/ ),但现在我所有这些表,我找不到任何简单的解释他们的内容:

I'm trying to get a simple synonym database up and running, so I can find synonyms of words the user entered (nothing else!). For this I grabbed a copy of the Wordnet sql thesarus (http://wnsql.sourceforge.net/), but now I'm presented with all these tables, and I can't find any simple explanation for their content anywhere:

adjpositions
adjpositiontypes
casedwords
lexdomains
lexlinks
linktypes
morphmaps
morphs
postypes
samples
semlinks
senses
synsets
vframemaps
vframes
vframesentencemaps
vframesentences
words

有人告诉我这些表包含什么,我需要什么,因为我不能根据他们的数据破译他们的内容。

Could someone tell me what these tables contain and which I need, since I cant decipher their content based on their data.

推荐答案

WordNet是一个超酷的字数据库。我自己一直在研究。

WordNet is a super cool word database. I have been researching it myself. I'll list my findings below - and hopefully it will help you to understand the tables better.

同步组表
synsets表是数据库中最重要的表之一。它负责容纳WordNet中的所有定义。 synset表中的每一行都有一个synsetid,一个定义,一个pos(词性字段)和一个lexdomainid(链接到lexdomain表)
在WordNet数据库中有117373个同义词。

The Synset Table The synsets table is one of the most important tables in the database. It is responsible for housing all the definitions within WordNet. Each row in the synset table has a synsetid, a definition, a pos (parts of speech field) and a lexdomainid (which links to the lexdomain table) There are 117373 synsets in the WordNet Database.

词汇表
WordNet还有一个words表,只有两个字段:一个wordid和一个引理。单词表负责容纳Wordnet数据库中的所有引文(基本单词)。
此表中有146625个条目

The Words Table WordNet also has a "words" table, that only has two fields: a wordid, and a "lemma". The words table is responsible for housing all the lemmas (base words) within the Wordnet Database. There are 146625 entries in this table

那么这两个表是如何链接的呢?答案?感觉表!

So.. how are these two tables linked? The answer? The sense table!

感官表
感觉表负责将单词(单词表)定义(在synset表中)。
意义表中的条目被称为词义对 - 因为wordid与synset的每个配对都是一个词的完整含义 - 词的意义。

WordNet数据库中总共有206,354个词意义。

The Sense Table The sense table is responsible for linking together words (in the words table), with definitions (in the synset table). The entries in the sense table are referred as "word-sense pairs" - because each pairing of a wordid with a synset is one complete meaning of a word - a "sense of the word".
There are a total of 206,354 word senses in the WordNet database.

Lexdomains表
Lexdomains表感知表,并且用于定义词义对所属的词汇域。在lexdomains表中有45个词法域。
因此,lexdomain表是WordNet的标记一个单词对的方式。然而,它是相当有限的,因为一个单词对只能属于一个词汇域。

The Lexdomains table The Lexdomains table is referenced by the sense table, and is used to define what lexical domain a word-sense pair belongs to. There are 45 lexical domains in the lexdomains table. The lexdomain table therefore, is WordNet’s way of "tagging" a word-sense pair. However, it is quite limited, because a word-sense pair can only belong to ONE lexical domain.

45个词汇网域包括:

The 45 lexical domains include:

形容词:
all ,pert

Adjectives: all, pert

副词
all

Adverbs all

/ strong>
顶部,动作,动物,人工制品,属性,身体,认知,通讯,事件,感觉,食品,集团,位置,动机, ,形状,状态,实质,时间

Nouns tops, act, animal, artifact, attribute,body, cognition, communication, event, feeling, food, group, location,motive,object, person, phenomenon, plant, possession, process, quantity,linkdef, shape, state, substance, time,

动词
身体,变化,认知,沟通,竞争,消费,创建,情感,运动,感知,拥有,社会,stative,天气,ppl

Verbs body, change, cognition,communication, competition, consumption, contact, creation, emotion, motion, perception, possession, social, stative, weather, ppl

casedwords表
字表自然有第一个字母大写,即:A-team。由于单词表将所有单词存储为小写,因此WordNet使用此表来指定单词的大写版本。
此表中有40313个条目。

The casedwords table Some words within the words table naturally have the first letter capitalized ie: "A-team". Since the words table stores all words as lowercase, WordNet uses this table to specify the uppercase version of the word. There are 40313 entries in this table.

WordNet数据库中还有许多其他表格,一旦我们进行研究,我会再次发布。

There are many other tables in the WordNet DB, once I have them researched, I'll post again.

查找同义词
要回答有关同义词的问题,您需要执行以下操作。

Finding yer synonyms To answer your question regarding synonyms - You need to do the following.

假设你想要找到单词Carry的同义词。为了这样做,你会首先搜索单词表中与词carry匹配的引理。这将产生wordid 21253.然后,您将搜索senses表,找到单词carry的所有字词对。这产生41个结果 - 每个结果列出了wordid 21253和一个senseid(它是词义对的索引)和一个synsetid。

Let's say you want to find the synonyms for the word "Carry". In order to do so, you would first search the words table for a lemma matching the word "carry". This would yield the wordid 21253. You would then search the senses table, to find all word-sense pairs for the word carry. This yields 41 results - each result lists the wordid 21253, and a senseid (which is the index of the word-sense pair) and a synsetid.

现在,然后需要查询每个synsetid返回的synset表,以便您可以访问synset表中的关联定义字段。

Now, you would then need to query the synset table for each of the synsetid's returned so you can access the associated definition field in the synset table.

最后找到每个

例如:
41个字符中的一个单词carry的词义对如下所示:

如果我们查找这个synsetid 202083512的定义,你会发现传输或作为传输的媒介

Example: One of the 41 word-sense pairs for the word "carry" is listed below: If we lookup the definition for this synsetid 202083512, you will find "transmit or serve as the medium for transmission"

要找到这个定义的所有同义词,搜索感觉表为同一synsetid 202083512.这产生同义词:渠道,行为,传达,传递和传输
(注意:你需要离开连接词表获得实际的引理)

To find all the synonyms for this definition, you would then search the sense table for the same synsetid 202083512. This yields synonyms: channel, conduct, convey, impart, and transmit (note: you will need to left join the words table to get the actual lemmas)

我希望这有助于为你解密WordNet。我发现它很酷...

I hope this helps demystify WordNet for you.. I'm finding it to be quite cool...

这篇关于Wordnet SQL说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆