在使用标准化和可伸缩性访问数据库时,您对在表列中使用文本标识符的看法是什么? [英] What is your opinion on using textual identifiers in table columns when approaching the database with normalization and scalability in mind?

查看:115
本文介绍了在使用标准化和可伸缩性访问数据库时,您对在表列中使用文本标识符的看法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

哪些表结构被认为更好地规范化了?



例如



注意:



使用idType作为主题的文本命名标识符。

p>

 
commentid ---- subjectid ----- idType
-------------- ------------------------
1 22 post
2 26照片
3 84回复
4 36 post
5 22 status

与此相比。

 
commentid ---- postid ----- photoid ----- replyid
--------------------- --------------------------
1 22 NULL NULL
2 NULL 56 NULL
3 23 NULL NULL
4 NULL NULL 55
5 26 NULL NULL

我正在查看他们和我不认为在第一个表中,我可以将它与外键约束=((即。如果帖子或照片被删除,则注释将被删除),在第二个可能的情况下,您将如何处理类似的问题,记住数据库将需要扩展加班和数据完整性也很重要=)。 / p>

感谢

解决方案

有几种方法可以采取,最简单的(严格来说,最正确的)将需要两个表,有明显的FK约束。

  commentid ---- subjectid ----- idType 
-------------------------- ------------
1 22 post
2 26照片
3 84回复
4 36 post
5 22 status

idType
------
post
photo
回复
status

如果你喜欢,你可以使用char(1)或类似的来减少varchar对键/索引长度的影响,或者方便使用ORM if你计划使用一个。 NULL的总是一个麻烦,如果你开始看到他们在你的设计,你会更好,如果你能找到一个方便的方法来消除它们。



第二种方法是在处理超过1亿行时所喜欢的:

  commentid ---- subjectid 
------------------------
1 22
2 26
3 84
4 36
5 22

postIds ---- subjectid
----------------------
1 22
4 36

photoIds ---- subjectid
-----------------------
2 26

replyIds ---- subjectid
-----------------------
3 84

statusIds ----主题ID
------------------------
5 22

当然也有(略微非标准化)混合方法,我广泛地使用大型数据集,如他们往往是脏的。只需为预定义的idTypes提供专门化表,但在commentId表上保留一个adhoc idType列。



请注意,即使混合方法只需要2x的空间非规范化表;并通过idType提供了平凡的查询限制。然而,完整性约束不是直接的,是对类型表的导出UNION的FK约束。我的一般方法是使用混合表上的触发器或等效的可更新视图来推进对正确的子类型表的更新。



和更复杂的子类型表方法工作;仍然,对大多数目的KISS适用,所以只是我怀疑你应该可能只是介绍一个ID_TYPES表,相关的FK,并与它做。


Which table structure is considered better normalized ?

for example

Note: idType tells on which thing the comment has taken place on, and the subjectid is the id of the item the comment has taken place on.

useing idType the textually named identifier for the subjectid.

commentid ---- subjectid ----- idType
--------------------------------------
1                22            post
2                26            photo
3                84            reply
4                36            post
5                22            status

Compared to this.

commentid ---- postid ----- photoid-----replyid
-----------------------------------------------
1                22          NULL        NULL
2                NULL         56         NULL
3                23          NULL        NULL
4                NULL        NULL        55
5                26          NULL        NULL

I am looking at both of them and I dont think in the first table I would be able to relate it to a foreign key constraint =( (ie. comment gets deleted if the post or photo is deleted), where as in the second one that is possible, how would you approach a similar issue keeping in mind that the database will need to expand overtime and data integrity is also important =).

Thanks

解决方案

The first is more normalized, if slightly incomplete. There are a couple of approaches you can take, the simplest (and strictly speaking, the most 'correct') will need two tables, with the obvious FK constraint.

commentid ---- subjectid ----- idType
--------------------------------------
1                22            post
2                26            photo
3                84            reply
4                36            post
5                22            status

idType
------
post
photo
reply
status

If you like, you can use a char(1) or similar to reduce the impact of the varchar on key/index length, or to facilitate use with an ORM if you plan to use one. NULL's are always a bother, and if you start to see them turn up in your design, you will be better off if you can figure out a convenient way to eliminate them.

The second approach is one I prefer when dealing with more than 100 million rows:

commentid ---- subjectid
------------------------
1                22    
2                26     
3                84     
4                36     
5                22     

postIds ---- subjectid
----------------------
1                22   
4                36   

photoIds ---- subjectid
-----------------------
2                26    

replyIds ---- subjectid
-----------------------
3                84    

statusIds ---- subjectid
------------------------
5                22     

There is of course also the (slightly denormalized) hybrid approach, which I use extensively with large datasets, as they tend to be dirty. Simply provide the specialization tables for the pre-defined idTypes, but keep an adhoc idType column on the commentId table.

Note that even the hybrid approach only requires 2x the space of the denormalized table; and provides trivial query restriction by idType. The integrity constraint however is not straight forward, being an FK constraint on a derived UNION of the type-tables. My general approach is to use a trigger on either the hybrid table, or an equivalent updatable-view to propigate updates to the correct sub-type table.

Both the simple approach and the more complex sub-type table approach work; still, for most purposes KISS applies, so just I suspect you should probably just introduce an ID_TYPES table, the relevant FK, and be done with it.

这篇关于在使用标准化和可伸缩性访问数据库时,您对在表列中使用文本标识符的看法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆