树结构数据的数据库设计 [英] Database design for tree structured data

查看:84
本文介绍了树结构数据的数据库设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,我需要分析但是我遇到了问题

搞清楚数据库的结构 - 或者是否有更好的

方式攻击问题。


基础数据集是对调查的大量回复

问卷。我已经调整了一个NLP程序来生成词法注释的结构树,这些结构树似乎可以合理地准确地将文本减少到

下降树。这些树由多个节点组成,每个节点由不同级别的子节点组成(通常为1到7个子节点)

每个包含1-5个分支的节点关于分支机构的统计分析概率

。没有迂腐,它很快就变成了一棵非常复杂的树,手工解释25k

左右的句子是不切实际的。我特意寻找的是一个

数据结构,用于包含句子,句子中的词汇描述

短语在每个短语中的下降

对短语的每个部分进行分类,直到整个条目分解为

按词汇类型分类的单个单词。我可以获得信息

填充结构 - 我只是不知道如何存储

结果进行汇总研究。


有人可以为树形结构数据建议可能的数据库设计,比如

,或指向我处理此类分析的参考资料?我b $ b无法想象可用的结构,并且你不能从这里到达那里

就像任何答案一样恰当。关于

对这种形式的数据进行汇总的建议将不胜感激。

解决方案



Will Honea写道:


我有一个数据集我需要分析但是我遇到了问题

计算出一个数据库的结构 - 或者是否有更好的方法来攻击这个问题。


基础数据集是对a的大量回复调查

问卷调查。我已经调整了一个NLP程序来生成词法注释的结构树,这些结构树似乎可以合理地准确地将文本减少到

下降树。这些树由多个节点组成,每个节点由不同级别的子节点组成(通常为1到7个子节点)

每个包含1-5个分支的节点关于分支机构的统计分析概率

。没有迂腐,它很快就变成了一棵非常复杂的树,手工解释25k

左右的句子是不切实际的。我特意寻找的是一个

数据结构,用于包含句子,句子中的词汇描述

短语在每个短语中的下降

对短语的每个部分进行分类,直到整个条目分解为

按词汇类型分类的单个单词。我可以获得信息

填充结构 - 我只是不知道如何存储

结果进行汇总研究。


有人可以为树形结构数据建议可能的数据库设计,比如

,或指向我处理此类分析的参考资料?我b $ b无法想象可用的结构,并且你不能从这里到达那里

就像任何答案一样恰当。关于

对这种形式的数据进行汇总的建议将不胜感激。



有许多方法可以表示rdbms中的树。 Google for

传递闭包,嵌套集,附件列表。下面是我在db2中实现一棵树的指示链接



http://fungus.teststation.com/~jon/t...eeHandling.htm


非常草率但它应该给你一些想法。我实现了添加,移动

并删除触发器中的操作。一个典型的树包含10 ^ 5 -

10 ^ 6个节点,以及传递闭包(在链接中描述为Path,我/ b $ b不熟悉术语)prox 10倍大。


我想我仍然有ddl躺在某个地方,所以如果它是感兴趣的b $ b,请给我留言
/ Lennart


On Thu,2006年11月23日21:37:46 -0800,Lennart写道:


>有人可以建议树形结构数据的可能数据库设计,例如这个或指向我处理此类分析的参考资料吗?我无法想象一个可用的结构,并且你不能从这里到达那里
就像任何答案一样恰当。关于解决这种形式的数据聚合的建议将不胜感激。



有很多方法可以表示rdbms中的树。 Google for

传递闭包,嵌套集,附件列表。下面是我在db2中实现一棵树的指示链接



http://fungus.teststation.com/~jon/t...eeHandling.htm


非常草率但它应该给你一些想法。我实现了添加,移动

并删除触发器中的操作。一个典型的树包含10 ^ 5 -

10 ^ 6个节点,以及传递闭包(在链接中描述为Path,我/ b $ b不熟悉术语)prox大10倍。


我想我仍然有ddl躺在某处,所以如果它是感兴趣的
,请给我留言



有趣 - 我从未考虑过这个角度。你b $ b似乎正在做的是实现一个平衡的树结构,尽管我首先想到的是我需要更宽的节点。让我想一下

这一点......我可以看到它可能适合它的位置

可能从输入表单中提取词汇结构
存储/搜索问题更容易处理。




Will Honea写道:


On Thu,2006年11月23日21:37:46 -0800,Lennart写道:



[...]
< blockquote class =post_quotes>
有趣 - 我从未考虑过这个角度。你b $ b似乎正在做的是实现一个平衡的树结构,尽管我首先想到的是我需要更宽的节点。



我不确定你的意思。你能详细解释一下你的b $ b意思是什么吗?假设下表:


创建表树(

node_id int not null主键,

parent_id int not null引用树




插入树(node_id,parent_id)值(1,1);

插入树(node_id, parent_id)

with iter(n)as(值1 union all选择n + 1来自iter,其中n <1000)


从iter中选择n,1 ;


不够宽吗?


/ Lennart


I have a data set which I need to analyze but I am having a problem
figuring out a structure for the database - or whether there are better
ways of attacking the problem.

The base data set is a large number of replies to a survey
questionaire. I have adapted a NLP program to produce lexically annotated
structural trees which appear to reasonably accurately reduce the text to
descending trees. These trees consist of multiple nodes each of which
consists of sub-nodes to an indeterminate level (normally 1 to 7 subnodes)
wih each node containing 1-5 branches depending on the statistical
probability of the branch of the branch. Without getting pedantic, it
quickly becomes a very complex tree and manual interpretation of the 25k
or so sentences is impractical. What I am specifically looking for is a
data structure to contain the sentences, the lexical descriptions of
phrases within the sentence with a descent into each phrase that
classifies each part of the phrase until the entire entry decomposes into
individual words classified by lexical type. I can get the information to
populate the structure - I just can''t figure out a way to store the
results for aggregate study.

Can someone suggest possible database designs for tree-strucured data such
as this or point me to references dealing with this type of analysis? I
cannot visualize a usable structure and "you can''t get there from here"
would be just as appropriate an answer as any. Suggestions on
tackling aggregation fo this form of data would be greatly appreciated.

解决方案


Will Honea wrote:

I have a data set which I need to analyze but I am having a problem
figuring out a structure for the database - or whether there are better
ways of attacking the problem.

The base data set is a large number of replies to a survey
questionaire. I have adapted a NLP program to produce lexically annotated
structural trees which appear to reasonably accurately reduce the text to
descending trees. These trees consist of multiple nodes each of which
consists of sub-nodes to an indeterminate level (normally 1 to 7 subnodes)
wih each node containing 1-5 branches depending on the statistical
probability of the branch of the branch. Without getting pedantic, it
quickly becomes a very complex tree and manual interpretation of the 25k
or so sentences is impractical. What I am specifically looking for is a
data structure to contain the sentences, the lexical descriptions of
phrases within the sentence with a descent into each phrase that
classifies each part of the phrase until the entire entry decomposes into
individual words classified by lexical type. I can get the information to
populate the structure - I just can''t figure out a way to store the
results for aggregate study.

Can someone suggest possible database designs for tree-strucured data such
as this or point me to references dealing with this type of analysis? I
cannot visualize a usable structure and "you can''t get there from here"
would be just as appropriate an answer as any. Suggestions on
tackling aggregation fo this form of data would be greatly appreciated.

There are a number of ways to represent trees in a rdbms. Google for
transitive closure, nested set, adjancy list. Heres a link to my notes
from implementing a tree in db2.

http://fungus.teststation.com/~jon/t...eeHandling.htm

Very sloppy but it should give you some ideas. I implemented add, move
and delete operations in triggers. A typical tree contained of 10^5 -
10^6 nodes, and Transitive closure (described as Path in the link, I
werent familiar with the term then) prox 10 times bigger.

I think I still have the ddl lying around somewhere, so if it would be
of interest, drop me a note
/Lennart


On Thu, 23 Nov 2006 21:37:46 -0800, Lennart wrote:

>Can someone suggest possible database designs for tree-strucured data such
as this or point me to references dealing with this type of analysis? I
cannot visualize a usable structure and "you can''t get there from here"
would be just as appropriate an answer as any. Suggestions on
tackling aggregation fo this form of data would be greatly appreciated.


There are a number of ways to represent trees in a rdbms. Google for
transitive closure, nested set, adjancy list. Heres a link to my notes
from implementing a tree in db2.

http://fungus.teststation.com/~jon/t...eeHandling.htm

Very sloppy but it should give you some ideas. I implemented add, move
and delete operations in triggers. A typical tree contained of 10^5 -
10^6 nodes, and Transitive closure (described as Path in the link, I
werent familiar with the term then) prox 10 times bigger.

I think I still have the ddl lying around somewhere, so if it would be
of interest, drop me a note

Interesting - I had never considered it from this perspective. What you
seem to be doing is implementing a balanced tree structure although my
first thought is that I need nodes considerably wider. Let me think on
this a bit more... I can see where this might well fit as it
potentially abstracts the lexical construct from the input form making
the storage/search issues much more tractable.



Will Honea wrote:

On Thu, 23 Nov 2006 21:37:46 -0800, Lennart wrote:

[...]

Interesting - I had never considered it from this perspective. What you
seem to be doing is implementing a balanced tree structure although my
first thought is that I need nodes considerably wider.

I''m not sure what you mean. Could you explain more in detail what you
mean by wider. Assume the following table:

create table tree (
node_id int not null primary key,
parent_id int not null references tree
)

insert into tree (node_id, parent_id) values (1,1);
insert into tree (node_id, parent_id)
with iter (n) as (values 1 union all select n+1 from iter where n<1000)

select n,1 from iter;

isnt that wide enough?

/Lennart


这篇关于树结构数据的数据库设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆