用于建模泛化层次结构的SQL [英] SQL for Modeling Generalization Hierarchies

查看:64
本文介绍了用于建模泛化层次结构的SQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种很好的方法来建模许多异构实体类型

,它们有一些共同的属性?


假设我有实体雇员分享一些属性(例如

firstname,lastname,dateofbirth)但是一些员工子集(例如,医生,门卫,护士,救护车司机)可能还有其他的东西。 >
属性不适用于所有员工。医生可能有

属性专业和董事会认证日期,救护车

司机可能有驾驶执照ID,看门人可能有

preferredbroomtype等等on。


部署应用程序后,有很多员工子类型可以动态添加

所以显然没有好处/>
向employees表添加属性,因为大多数属性都是
为NULL(因为看门人从不是同时的医生)。


我找到的唯一解决方案是泛化层次结构,其中

您拥有包含所有通用属性的employee表,然后您需要为每个新员工子类型添加表b
。子类型

表共享employee表的主键。员工表

有一个鉴别器。允许您确定要为特定实体加载哪个

子类型表的字段。

此解决方案似乎无法扩展,因为对于每个值

" discriminator"我需要使用不同的表执行连接。什么

如果我需要一次检索1,000名员工?


是否可以通过一个SQL语句获得单个ResultSet

SQL?


或者你需要迭代查看鉴别器然后

执行适当的连接吗?如果这种迭代是必要的那么

那么显然这种泛化层次结构方法在

练习中不起作用

因为它会非常缓慢。 br />

有没有更好的方法来建模这些具有共享属性的异构

实体,这些实体不涉及创建表格

for每个新员工类型或有sparce表(大多数填写

NULLS)


我想另一种方法是使用名称/值对但是

会让报告变得非常难看。


似乎是一个非常常见的问题。有任何想法吗?这是SQL的基本限制吗?


谢谢!


- 罗伯特

Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?

Say I have entities "employees" which share some attibutes (e.g.
firstname, lastname, dateofbirth) but some subsets of employees (e.g.
physicians, janitors, nurses, ambulance drivers) may have additional
attributes that do not apply to all employees. Physicians may have
attributes specialty and date of board certification, ambulance
drivers may have a drivers license id, janitors may have
preferredbroomtype and so on.

There are many employee subtypes and more can be dynamically added
after the application is deployed so it''s obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Is that possible to obtain a single ResultSet with one SQL statement
SQL?

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.

Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)

I guess another approach would be to use name/value pairs but that
would make reporting really ugly.

Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?

Thanks!

- robert

推荐答案



" Robert Brown" < RO ************* @ yahoo.com>在消息中写道

news:24 ************************** @ posting.google.c om ...

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
是否有一种很好的方法来建模许多异构实体类型
,它们有一些共同的属性?


这是一个经常被问到的问题。不幸的是,经常给出的

回复通常不能解决问题。


有很多员工子类型,可以动态添加更多
在部署应用程序之后,显然没有好处继续向employees表添加属性,因为大多数属性都是NULL(因为janitors不是同时的医生)。

我找到的唯一解决方案是泛化层次结构,其中您拥有包含所有通用属性的员工表,然后根据需要为每个新员工子类型添加表。子类型
表共享employee表的主键。员工表
有一个鉴别器。允许您确定要为特定实体加载哪个子类型表的字段。


如果您在特定的基础上发明新的亚型,(例如,DBA绝不是一个

看门人或医生)

并且您在同样临时的基础上为子类型创建新表格,我会建议您在临时基础上更改数据模型。


如果你愿意,你可以这样做,但不要指望从稳定的数据模型中得到同样的力量和简单性。

这个解决方案可以做到这一点。因为对于
discriminator的每个值,似乎没有按比例缩放。我需要使用不同的表执行连接。什么
如果我需要一次检索1,000名员工?


为什么你需要多个子类型的属性?如果你的

查询是关于医生的,为什么你需要加入关于看门人的表格

查询?


如果你的查询是关于员工的,为什么你需要知道任何

子类型属性?

或者你我需要迭代看看鉴别器然后
执行适当的连接?如果这种迭代是必要的那么显然这种泛化层次结构方法在
实践中不起作用,因为它会非常缓慢。


您需要询问有关模型的两个问题,以及基于模型的任何imlpementation



优先,是它正确完整吗?第二,如果是第一个,那么它是否足够快?


一般来说,同一模型的不同实现应该是

透明给imlpementation的用户只看模型中可见的



是否有更好的方法来建模这些异构
具有共享属性的实体,不涉及为每个新员工类型创建表
或有sparce表(大多数填充了
NULLS)


如果你有新的实体,你将会有新的关系。

意味着拥有新表,或者捏造表与

rleations之间的关系。做第二个是你自己的危险。

似乎是一个非常普遍的问题。有任何想法吗?这是SQL的基本限制吗?
Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?
This is a frequently asked question. Unfortunately, the frequently given
responses don''t usually settle the matter.

There are many employee subtypes and more can be dynamically added
after the application is deployed so it''s obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.
If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a
janitor or a doctor)
and you invent new tables for subtypes on an equally ad hoc basis, I''d
suggest that you are altering your data model on an ad hoc basis.

You can do that if you want, but don''t expect the same power and simplicity
that you get from a stable data model.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?
Why would you ever need the attributes of more than one subtype? If your
query is about doctors, why would you need to join the table about janitors
into the query?

If your query is about employees, why would you need to know any of the
subtype attributes?

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.
You need to ask two questions about the model, and about any imlpementation
based on the model:
first, is it correct and complete? Second, if yes to the first, is it fast
enough?

In general, different implementations of the same model should be
transparent to users of the imlpementation who only look at the features
visible in the model.


Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)
If you have new entities, you are going to have new relations. That either
means having new tables, or fudging the relationship between tables and
rleations. Do the second one at your own peril.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?




我建议它超越SQl到使用外键的核心/

主键建立联系。



I would suggest it goes beyond SQl to the very heart of using foreign key/
primary key aossciations to establish linkages.




" Robert Brown" < RO ************* @ yahoo.com>在消息中写道

news:24 ************************** @ posting.google.c om ...

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
将属性添加到employees表中,因为大多数属性都是NULL(因为看门人从不是医生同时)。
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).




顺便说一句,在我写完DBA永远不是看门人或医生之后它对我来说是个b $ b,当我还是一名DBA的时候,有几天我觉得我两个人同时都是b $ b! ;)



By the way, after I wrote "a DBA is never a janitor or a doctor" it
occurred to me that when I was a DBA there were days when I felt like I was
both at the same time! ;)


Robert Brown(ro*************@yahoo.com)写道:
Robert Brown (ro*************@yahoo.com) writes:
我找到的唯一解决方案是泛化层次结构,其中您拥有包含所有通用属性的员工表,然后根据需要为每个新员工子类型添加表。子类型
表共享employee表的主键。员工表
有一个鉴别器。允许您确定要为特定实体加载哪个子类型表的字段。

此解决方案似乎无法扩展,因为对于
discriminator的每个值都是如此。我需要使用不同的表执行连接。什么
如果我需要一次检索1,000名员工?


那又怎样?现代RDBMS的1000行是轻而易举的。

是否可以通过一个SQL语句获得单个ResultSet
SQL?


是的,虽然有许多鉴别器,但它将是一个地狱般的加入:


SELECT main.col1,main.col2 ,. .. d1.col1,d1.col2,...

FROM main

LEFT JOIN鉴别器d1 ON d1.keycol = main.keycol

LEFT JOIN鉴别器d2 ON d2.keycol = main.keycol

...


在实践中,事情可能更加混乱,因为某些值可能

适用于四个鉴别器,但与其余鉴别器无关。在

中,它们将是四个不同的列。虽然,这可能是

解决:


common_to_four = colaesce(d3.common,d6.common,d9.common,d11.common)


但是将它作为一个可以为空的列添加到常见的

表中可能会更好。


同样,如果两个判别器非常相似,它们可能更好地将它们放在同一个表中。

我想另一种方法是使用名称/值对但是那个

您可以减少对拼写错误的控制。但有时候这可以通过它来实现。

似乎是一个非常常见的问题。有任何想法吗?这是SQL的基本限制吗?
The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?
So what? 1000 rows for a modern RDBMS is a breeze.
Is that possible to obtain a single ResultSet with one SQL statement
SQL?
Yes, although with many discriminators, it will be one hell of a join:

SELECT main.col1, main.col2, ... d1.col1, d1.col2, ...
FROM main
LEFT JOIN discriminator d1 ON d1.keycol = main.keycol
LEFT JOIN discriminator d2 ON d2.keycol = main.keycol
...

In practice, things might be even messier, because some values might
apply to four discriminators, but be irrelevant to the rest. In
they the would be four different columns. Although, this could be
addressed with:

common_to_four = colaesce(d3.common, d6.common, d9.common, d11.common)

But it may be better to add this as a nullable column to the common
table.

Likewise, if two discrimiators are very similar, it may be better to
lump them in the same table.
I guess another approach would be to use name/value pairs but that
would make reporting really ugly.
And you get less control over your spelling errors. But sometimes this
is the way to go.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?




排序。 SQL表是平方的,面向对象的层次是

参差不齐。


但是这没关系,只是不要过于面向对象。也是务实的。


-

Erland Sommarskog,SQL Server MVP,所以**** @ algonet.se


SQL Server SP3的联机书籍
http://www.microsoft.com/sql/techinf...2000/books.asp


这篇关于用于建模泛化层次结构的SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆