异构属性集的表示 [英] Representation for Heterogeneous Attribute Set

查看:105
本文介绍了异构属性集的表示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的公司正在研究债券衍生品投资组合分析工具和

我们正面临一个问题,我没有看到任何

文献中的问题。我真的做过RTFM。我在

关系建模(10年以上)方面非常有经验,所以这不是理解原则的情况。这是一个被剥夺了

无关背景的问题。下面的问题是为了

的例子而简化,所以不要赘述。


问题


1.债券种类很多,每种类型都有不同的

属性,不同的属性名称,不同的属性数据类型。


例如,债券A有两个变量:年利率和发行日期,B有五个变量:利率和4个特定的

日期,各个部分需要支付本金,债券C

有一组4个变量:期间1的利率,

期间的利率2,债券的日期可以放回发行人,

和两个可以通过该问题调用债券的日期。等等。


因此,在第一次尝试时,我可以将每种债券类型表示为自己的

表。例如,


create table bond_type_a(rate INTEGER,issue_date DATE)

create table bond_type_b(rate INTEGER,principle_date1 DATE,

principle_date2 DATE,principle_date3 DATE,principle_date4 DATE)

create table bond_type_c(rate1 INTEGER,rate2 INTEGER,put_date DATE,

call_date DATE)


这是一个很好的关系方法,但它不起作用,因为:


2.因此我们必须拥有数千种债券类型

成千上万的表是坏的。

3.客户需要能够通过UI动态构建债券类型并将其添加到系统中。显然,如果

在UI中创建的每种新类型的债券都会产生新表格,那将是不好的。


4.当用户加载债券时投资组合需要非常快。在

表格中,如果用户拥有100种不同的类型,则每种类型的表格如果在投资组合中你必须做100个连接。这是一个非常多的用户环境,所以它不是一个非首发。这是非常缓慢的。


解决方案


所以现在我们可以考虑采用每种债券类型的方法,我们可以考虑

后续解决方案(从
视图的关系点来看令人不快):


1.名称 - 价值对。


创建表键(bond_id INTEGER,bond_type INTEGER,attribute_id

INTEGER,value VARCHAR(255))


评论:客户端不喜欢这种方法,因为他们想要运行各种报告,因此他们不希望将值b / b
存储为VARCHAR。他们希望DB强制执行数据类型。

2.键入的名称 - 值对。


创建表键(bond_id INTEGER,bond_type INTEGER,attribute_id

INTEGER,int_val INTEGER,string_val VARCHAR(255),date_val DATE_


注释:客户端不喜欢这个,因为表格很稀疏。

每行有两个空字段。

3.链表与每种数据类型的表格。


创建表格债券(bond_id INTEGER)


create table bond_int_data(bond_id INTEGER REFERENCES bond(bond_id),

value INTEGER)

create table bond_string_data(bond_id INTEGER REFERENCES

债券(bond_id),价值VARCHAR(255))

创建表bond_date_data(bond_id INTEGER REFERENCES bond(bond_id),

value DATE)


评论:这符合大部分要求,但看起来很丑陋。

4.动态映射


创建表格( bond_id INTEGER,int_val1 INTEGER, int_val2 INTEGER,

date_val1 DATE,date_val2 DATE,string_val1 VARCHAR(255),string_val2

VARCHAR(255))


Then你必须在你的代码中添加一些动态映射,它将提供特定于绑定的映射(比如存储在XML文件中)。对于

示例,


对于bond_A:annual_rate映射到int_val1,issue_date映射到date_val1

对于bond_C:rate1映射到int_val1 ,rate2映射到int_val2,put_date

映射到date_val1,call_date映射到date_val2)


注释:这对于性能非常好,因为当我加载

不同债券类型的投资组合我可以用一个SELECT

语句将它们全部拉入。然而,这种方法有一个问题,即表格是稀疏的b
。每种类型的字段数量必须高达

来修改最复杂的债券,而简单的债券只会使用两个或三个b $ b。

问题:


我上面描述的四种方法是否详尽无遗?我忽略了其他

其他吗?

My company is working on a bond derivative portfolio analysis tool and
we''re facing a problem that I did not see adequately addressed any
where in literature. I really did RTFM. I''m very experienced in
relational modelling (10+ years) so this is not a case of not
understanding the principles. Here is the problem stripped of
irrelevant context. The problem below is simplified for the sake of the
example so don''t sweat the details.

THE PROBLEM

1. There are many types of bonds, each type has a different set of
attributes, different attribute names, different attribute datatypes.

For example, bond A has two variables: a yearly interest rate and
date of issue, B has five variables: an interest rate and 4 specific
dates on which various portions of principal need to be paid, bond C
has a set of 4 variables: interest rate in period 1, interest rate in
period 2, the date on which the bond can be put back to the issuer,
and two dates on which the bond can be called by the issue. And so on.

So, on the first attempt I could represent each bond type as its own
table. For example,

create table bond_type_a (rate INTEGER, issue_date DATE)
create table bond_type_b (rate INTEGER, principle_date1 DATE,
principle_date2 DATE, principle_date3 DATE, principle_date4 DATE)
create table bond_type_c (rate1 INTEGER, rate2 INTEGER, put_date DATE,
call_date DATE)

This is the nice relational approach but it does not work because:

2. There are many thousands of bond types thus we would have to have
many thousands of tables which is bad.
3. The client needs to be able construct the bond types on the fly
through the UI and add it to the system. Obviously, it would be bad if
each new type of bond created in the UI resulted in a new table.

4. When a user loads the bond portfolio it needs to be very fast. In
the table per type approach if a user has a 100 different types if bond
in the portfolio you would have to do 100 joins. This is a heavily
multi user environment so it''s a non-starter. It''s impossibly slow.

THE SOLUTIONS

So now that we ditched the table per bond type approach we can consider
the followiing solutions (unpleasant from the relational point of
view):

1. Name-Value pairs.

create table bonds (bond_id INTEGER, bond_type INTEGER, attribute_id
INTEGER, value VARCHAR(255))

Comment: The client does not like this approach because they want to
run various kinds of reports and thus they doe not want the values to
be stored as VARCHAR. They want the DB to enforce the datatype.
2. Typed Name-Value pairs.

create table bonds (bond_id INTEGER, bond_type INTEGER, attribute_id
INTEGER, int_val INTEGER, string_val VARCHAR(255), date_val DATE_

Comment: The client does not like this because the table is sparse.
Every row has two empty fields.
3. Link table with table per data type.

create table bonds (bond_id INTEGER)

create table bond_int_data (bond_id INTEGER REFERENCES bonds(bond_id),
value INTEGER)
create table bond_string_data (bond_id INTEGER REFERENCES
bonds(bond_id), value VARCHAR(255))
create table bond_date_data (bond_id INTEGER REFERENCES bonds(bond_id),
value DATE)

Comment: This meets most of the requirements but it just looks ugly.
4. Dynamic Mapping

create table (bond_id INTEGER, int_val1 INTEGER, int_val2 INTEGER,
date_val1 DATE, date_val2 DATE, string_val1 VARCHAR(255), string_val2
VARCHAR(255))

Then you have to add some dynamic mapping in your code which will
provide bond specific mapping (say, stored in an XML file). For
example,

For bond_A: yearly_rate maps to int_val1, issue_date maps to date_val1
For bond_C: rate1 maps to int_val1, rate2 maps to int_val2, put_date
maps to date_val1, call_date maps to date_val2)

Comment: This is very good for performance because when I load a
portfolio of different bond types I can pull them all in in one SELECT
statement. However this approach has a problem that the table is
sparse. The number of fields of each type has to be as high as to
accmodate the most complex bond while simple bonds will only be using
two or three.
THE QUESTIONS:

Are the four approaches I described above exhaustive? Are there any
other that I overlooked?

推荐答案



< ; RO ************* @ yahoo.com>在消息中写道

news:11 ********************** @ l41g2000cwc.googlegr oups.com ...

<ro*************@yahoo.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
我的公司正在研究债券衍生品投资组合分析工具,并且我们正面临一个问题,我没有看到任何文献中充分解决的问题。我真的做过RTFM。我在关系建模(10年以上)方面非常有经验,所以这不是不理解原理的情况。这是一个被剥夺了无关背景的问题。下面的问题是为了
示例而简化,所以不要赘述。
My company is working on a bond derivative portfolio analysis tool and
we''re facing a problem that I did not see adequately addressed any
where in literature. I really did RTFM. I''m very experienced in
relational modelling (10+ years) so this is not a case of not
understanding the principles. Here is the problem stripped of
irrelevant context. The problem below is simplified for the sake of the
example so don''t sweat the details.



[主要剪辑]


罗伯特,


通常情况下,当我看到像您这样的问题时,我建议您在关系模型概括中搜索一下谷歌




但是,我认为,从你提供的例子中,你已经知道了大多数人从这样的搜索中学到的东西。

无论如何你可能想要进行搜索,只是为了交叉检查你的结果。


接下来的是,恐怕不是非常有用。我写这篇文章是希望

它可以带领一些建设性的东西。


在我看来,问题的核心是用户可以要求

即时定义新类型的债券。


在经典的关系设计中,对主题的数据分析

揭示所有实体和实体之间的关系这是话语世界中的b $ b。从那里,添加描述所有相关数据值的属性和域

是很简单的,如果

乏味。


下一步是设计关系模型。在这一步中,得到了一个相当有限的,非常稳定的元组类型集。元组类型是全面的,因为它们允许在数据域方面陈述话语领域中的所有事实。当话语的范围扩大时,你只需要添加一个新的元组类型。


下一步是构建物理模型,并构建数据库。当

完成时,我们通常会发现添加现有

元组类型的新实例可以通过插入新数据来完成。另一方面,在
上创建新的元组类型需要更改元数据。这是经典的关系

设计和开发。


从你的问题大纲中,我很确定你已经知道了

我到目前为止所说的一切,即使你会用不同的方式表达




现在,这里是擦:你可以创建一个新的债券类型而不创建一个新的
元组类型?我认为答案是否定的。

您是否可以在不更改元数据的情况下创建新的元组类型?我认为

答案是否定的。

您能信任最终用户管理元数据吗?我认为答案

不是。


通常的解决方案,你称之为动态的是将元数据伪装成

数据,将其存储在用户表中。

这可能是你要做的最好的。但它仍然意味着你的

用户正在管理元数据,无论他们是否知道。


这不是答案,但它值得给予一些思考。


[major snip]

Robert,

Normally, when I see a question like yours, I recommend doing a google
search on "relational model generalization".

However, I think, from the examples you provide, that you already know the
things most people learn from such a search.
You might want to do the search anyway, just to cross check your
conclusions.

What follows is, I am afraid, not terribly helpful. I write it in the hope
that it can lead somewhere constructive.

The nut of the problem, it seems to me, is the requirement that users can
define new types of bonds on the fly.

In classical relational design, a data analysis of the subject matter
reveals all the "entities" and "relationships between entities" that are in
the universe of discourse. From there, adding the attributes and domains
that describe all of the relevant data values is straightforward, if
tedious.

The next step is to design a relational model. In this step a fairly
limited, and very stable set of tuple types is derived. The tuple types are
comprehensive, in the sense that they permit stating all the facts in the
universe of discourse in terms of the data domains. You only need to add a
new tuple type when the universe of discourse expands.

The next step is to build a physical model, and build the database. When
that''s done, we generally find that adding a new instance of an existing
tuple type can be done by inserting new data. Creating a new tuple type, on
the other hand, requires altering the metadata. That''s classical relational
design and development.

From your outline of the problem, I''m pretty sure you already know
everything I''ve said up to this point, even if you would have phrased it
differently.

Now, here''s the rub: Can you create a new bond type without creating a new
tuple type? I think the answer is no.
Can you create a new tuple type without altering metadata? I think the
answer is no.
Can you trust end users with the management of metadata? I think the answer
is no.

The usual solution, which you''ve called "dynamic" is to disguise metadata as
data, by storing it in user tables.
That may be the best you are going to do. But it still means that your
users are managing metadata, whether they know it or not.

This is not the answer, but it''s worth giving some thought.


< ro ************* @ yahoo.com>在消息中写道

新闻:11 ********************** @ l41g2000cwc.googlegr oups.com ......

< snip>
<ro*************@yahoo.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
<snip>
问题:

我上面描述的四种方法是否详尽无遗?还有其他我忽略的吗?
THE QUESTIONS:

Are the four approaches I described above exhaustive? Are there any
other that I overlooked?



我不愿提及这一点,但万一你愿意使用数据模型

除了关系,这是一个完美的工具应用程序,允许

弱打字,以及其他功能。如果您要在

XML文档中对此应用程序进行建模,例如,暂时忽略dtds和xsds,您将看到

表示数据的模式是有意义的用户也可以理解数据库产品的思想和思想。它对于今天的RDBMS实现来说简直不是很好,而且对于强类型的
持久性引擎也不是很好。


--dawn


I hesitate to mention this, but in case you are open to using data models
other than relational, this is a perfect application for tools that permit
weak typing, among other features. If you were to model this application in
XML documents, for example, ignoring dtds and xsds for now, you will see
that the pattern to the data that makes sense in your mind and the minds of
the users can also make sense to a database product. It simply isn''t a good
match for today''s RDBMS implementations, nor for a strongly typed
persistence engine.

--dawn


审核员和联邦政府是否知道任何人都可以在你的系统中发明一种新的

安全性?我对此表示怀疑。


经典场景需要一个具有所有常见

属性的根类,然后是它下面的专用子类。作为一个例子,

让我们拿车辆类并找到行业标准

标识符(VIN),并添加两个互斥的子类,Sport < b / b
多功能车和轿车(''SUV'',''SED'')。


CREATE TABLE车辆

(vin CHAR(17)NOT NULL PRIMARY KEY,

vehicle_type CHAR(3)NOT NULL

CHECK(vehicle_type IN(''SUV'',''SED'')) ,

UNIQUE(vin,vehicle_type),

..);


注意重叠的候选键。然后我使用复合候选人

键(vin,vehicle_type)和每个子类表中的约束来确定vehicle_type是否已锁定且与车辆<同意br />
表。添加一些DRI动作,你就完成了:


CREATE TABLE SUV

(vin CHAR(17)NOT NULL PRIMARY KEY,

vehicle_type CHAR(3)DEFAULT''SUV''非空

CHECK(vehicle_type =''SUV''),

UNIQUE(vin,vehicle_type),

FOREIGN KEY(vin,vehicle_type)

参考车辆(vin,vehicle_type)

ON UPDATE CASCADE

ON DELETE CASCADE ,

..);


CREATE TABLE轿车

(vin CHAR(17)NOT NULL PRIMARY KEY,
vehicle_type CHAR(3)DEFAULT''SED''非空

CHECK(vehicle_type =''SED''),

UNIQUE(vin,vehicle_type ),

FOREIGN KEY(vin,vehicle_type)

参考车辆(vin,vehicle_type)

ON UPDATE CASCADE

ON DELETE CASCADE,

..);


我可以继续构建这样的层次结构。例如,如果我有一个

轿车表分解成双门和四门轿车,我可以用这样的架构来获得



CREATE TABLE轿车

(vin CHAR(17)NOT NULL PRIMARY KEY,

vehicle_type CHAR(3)DEFAULT''SED''NOT NULL

CHECK(vehicle_type IN(''2DR'',''4DR'',''SED''),

UNIQUE(vin,vehicle_type),

FOREIGN KEY(vin,vehicle_type)

参考车辆(vin,vehicle_type)

ON UPDATE CASCADE

ON DELETE CASCADE ,

..);


CREATE TABLE TwoDoor

(vin CHAR(17)NOT NULL PRIMARY KEY,
vehicle_type CHAR(3)DEFAULT''2DR''NOT NULL

CHECK(vehicle_type ='''2DR''),

UNIQUE(vin,vehicle_type ),

FOREIGN KEY(vin,vehicle_type)

参考轿车(vin,vehicle_type)

ON UPDATE CASCADE

ON DELETE CASCADE,

..);

CREATE TABLE FourDoor

(vin CHAR(17)NOT NULL PRIMARY KEY,

vehicle_type CHAR(3)DEFAULT''4DR''NOT NULL

CHECK(vehicle_type =''4DR''),

UNIQUE(vin,vehicle_type),

FOREIGN KEY(vin,vehicle_type)

REFERENCES轿车(vin,vehicle_type)

ON UPDATE CASCADE

ON DELETE CASCADE,

..);


我们的想法是在UNIQUE()

约束中构建一系列标识符和类型,当您使用REFERENCES约束时,该约束会出现在树上。

显然,你可以做这个技巧的变种来获得不同的类

结构。


如果一个实体不必是专属的一个子类型,你玩

类层次结构的根:

CREATE TABLE车辆

(vin CHAR(17) NOT NULL,

vehicle_type CHAR(3)NOT NULL

CHECK(vehicle_type IN(''SUV'',''SED'')),

PRIMARY KEY(vin,vehicle_type),

..);


现在立即开始在VIEW中隐藏所有这些内容并添加一个INSTEAD

OF触发这些视图。

Do the auditors and the feds know that anyone can invent a new kind of
security in your system? I doubt it.

The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let''s take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans (''SUV'', ''SED'').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN (''SUV'', ''SED'')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT ''SUV'' NOT NULL
CHECK(vehicle_type = ''SUV''),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT ''SED'' NOT NULL
CHECK(vehicle_type = ''SED''),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had a
Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT ''SED'' NOT NULL
CHECK(vehicle_type IN (''2DR'', ''4DR'', ''SED'')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT ''2DR'' NOT NULL
CHECK(vehicle_type = ''2DR''),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT ''4DR'' NOT NULL
CHECK(vehicle_type = ''4DR''),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn''t have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN (''SUV'', ''SED'')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an INSTEAD
OF trigger to those VIEWs.


这篇关于异构属性集的表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆