用于创建分组数据收集列表的ERD草案 [英] ERD draft for creating grouped data collection list

查看:118
本文介绍了用于创建分组数据收集列表的ERD草案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是表设计关于数据收集元素集的后续问题,因为我仍然试图提出有一个设计。



我想做的是能够预先定义什么学习/协议对所需的数据收集,列表或清单,可以在临床访问中跟踪患者。附加的是我到目前为止在每个表中可能的例子,但我从来没有实现超类型/子类型关系,所以我不知道我是否在正确的轨道。它是否过度归一化?或者我应该甚至打扰超级类型/子类型?



任何想法/反馈将有所帮助。



编辑



@YoungBob首先感谢您的输入。 FormId(PK)也是DataCollectionId的外键,所以我可以通过DataCollection.DatacollectionId = Form.FormId查询具有相同ID的表,以获取两个级别的属性,否?



我不会提供一个界面来即时创建这些表单,这就是为什么我不想包括部分或问题类型,但我喜欢包含版本控制。



如上所述,我将加载测试数据,以查看演出是否应该对任何表格进行归一化。



由于我发布了这个问题,我已经按照这种方式添加了DataCollectionIntervals的链接 - 看起来好多了吗?



http://imageshack.us/f/716/erd02.png/

解决方案

模式设计对我来说至少是基于您在此和上一篇文章中给出的信息。最佳做法是从规范化的设计开始,然后在需要查询优化的地方去规范化。我猜测数据库不会很大或交易率很高,所以性能不应该是一个问题,所以我会坚持规范化的设计。如果您需要编写加入超过4个表的查询(至少在sql server中),那么反常规化的规则可能是值得的,但我无法真正看到此模式设计发生的情况。



正如您在问题中建议的那样,Form和Sample表可以通过在DataCollection表中包含属性来进行非规范化,但这取决于Form和Sample有多少其他属性以及多少两个都是共同的。



我会给出的一个提示是考虑给Form表格一个主键,这是一个短字符串,假设你有相当标准的表单,我发现在浏览表格(例如有点像HMRC形成P45,P60等等或机场代码LHR,JFK等)时,使生活变得更简单,因为您不需要继续加入其他表格来记住哪一种形式特定的int ID是指。 CHAR(3)字段也使用比INT少的存储空间。这可能适用于其他表,如DataCollectionType。但是这可能是个人偏好的问题。



从我们前面发布的讨论中我们讨论过的DataFrequency表可能应该是DataCollection表的多个链接。也许DataCollectionIntervals可能是一个更好的名称。



在设计中需要考虑的另一件事是,一些经常访问的表是否有利于垂直拆分。这意味着如果表具有宽行,即很多属性或存储饥饿属性(如VARCHAR(MAX))不经常访问的属性可以分割成具有1-1链接的单独表,这可以显着提高涉及此表的查询性能。但是,正如我所说,我并没有真正看到性能是你计划的数据库大小的一个问题,并假设你将使用像SQL Server这样的东西。



最后一件事... Form的结构可能比目前的模式更复杂一些,例如,Forms通常被分成几个部分,所需的问题类型可能非常复杂,例如多项选择,文本,分支,条件。表单也可以存在于不同的版本中(使用Active标识来标识窗体表中的当前活动版本)。我已经看过使用queXML自己设计一个问卷调查问卷,但是我认为这是一个有点过分的需要,所以我决定了一个更简单的我自己的XML模式,可以导入数据库。


This is a follow up question for Table design about sets of data collection elements as I am still trying to come up with a design.

What I would like to do is to be able to pre-define what study/protocol pair requires as a data collection to be displayed like a to-do list or checklist which can be tracked at clinic visits for patients. Attached is what I have so far with possible examples in each table but I have never implemented supertype/subtype relationship so I am not sure if I am on the right track. Does it overly normalized? or should I even bother going with supertype/subtype?

Any thoughts/feedback would help.

EDIT

@YoungBob First of all thanks a lot for your input. FormId(PK) is also a foreign key to DataCollectionId so I can query either tables with the same ID by DataCollection.DatacollectionId = Form.FormId to get both level attributes, no?

I will not provide an interface to create these forms on the fly so that is why I didn't want to include sections or question types but I liked the idea of including version control.

As you mentioned I will load it with test data to see the performance whether I should de-normalise any tables.

Since I posted the question I have added the link for DataCollectionIntervals as you suggested in this manner - is it looking much better?

http://imageshack.us/f/716/erd02.png/

解决方案

The schema design looks fine to me, at least based on the information you gave in this and the previous post. Best practice is to start with a normalised design and then de-normalise where you think query optimisation is needed. I'm guessing the database isn't going to be massive or have a high rate of transactions, so performance shouldn't be an issue so I would stick with the normalised design. As a rule of thumb denormalisation may be worthwhile if you need to write queries which join more than 4 tables (in sql server at least), but I can't really see that happening with this schema design.

As you suggest in your question the Form and Sample tables could be candidates for denormalisation by including attributes for both within the DataCollection table, but this will depend on how many other attributes Form and Sample have and how many are common to both.

One tip i would give is to consider giving Form table a primary key which is short character string, assuming you have fairly standard forms, which I find makes life a bit easier when browsing tables (e.g. a bit like HMRC forms P45, P60, etc. or airport codes LHR, JFK, etc.) as you don't then have to keep joining with the other tables to remember which form a particular int ID refers to. A CHAR(3) field also uses less storage than an INT. This may apply to other tables like DataCollectionType. But this is probably a matter of personal preference.

From our discussion in the previous post the DataFrequency table we talked about probably should be a many-1 link to the DataCollection table. Perhaps DataCollectionIntervals may be a better name for it.

One other thing to think about in the design is whether some frequently accessed tables would benefit form vertical splitting. By this I mean if the table has wide rows i.e. a lot of attributes or storage hungry attributes like VARCHAR(MAX) infrequently accessed ones can be split off into a separate table with a 1-1 link which can significantly improve query performance involving this table. But as I say I don't really see performance being an issue with the size of database you're planning and assuming you'll be using something like SQL Server.

And one final thing...the structure of Forms may be a bit more complex than the schema currently indciates, for example Forms are typically broken into several sections and question types needed can be quite complex e.g. multiple choice, text, branching, conditional. Also forms can exist in different versions (use Active flag to identify currently active version in Forms table). I've looked at using queXML myself to design a questionnaire in XML but decided it was a bit overkill for what I needed so I decided on a simpler XML schema of my own which can be imported into the database.

这篇关于用于创建分组数据收集列表的ERD草案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆