如何为用户定义的字段设计数据库? [英] How to design a database for User Defined Fields?

查看:140
本文介绍了如何为用户定义的字段设计数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的要求是:




  • 需要能够动态添加任何数据类型的用户定义字段

  • 需要能够快速查询UDF

  • 需要能够根据数据类型对UDF进行计算

  • 需要能够根据数据类型排序UDF



其他信息:




  • 我正在寻找性能主要

  • 有几百万条主记录可以附加UDF数据


  • 大部分时间,UDF只附加到几千条主记录,而不是全部

  • UDF未加入或用作键。


    1. 使用StringValue1,StringValue2 ... IntValue1,IntValue2等创建一个大表。我讨厌这个想法,但如果有人能告诉我它比其他想法更好


    2. 创建动态表,根据需要添加新列。我也不喜欢这个想法,因为我觉得性能会很慢除非你索引每一列。


    3. 创建一个单一的表包含UDFName,UDFDataType和值。当添加一个新的UDF时,生成一个View,它只提取这些数据并将它解析成任何指定的类型。不符合解析条件的项目会返回NULL。


    4. 创建多个UDF表,每个数据类型一个。所以我们有UDFStrings,UDFDates等的表。可能会像#2一样,并在添加新字段时自动生成一个View。


    5. XML DataTypes?我以前没有与这些工作,但已经看到他们提到。


    6. 还有什么吗?



    解决方案

    如果性能是主要关注的问题, ...每个UDF一个表(真的,这是#2的变体)。



    优点:



    ol>
  • 因为你表明一些UDF
    对于整个数据集的一小部分有价值,一个单独的
    表将给你最好的
    的性能,因为该表将
    只有它需要是
    才能支持UDF。


  • 您还可以通过限制为聚合或其他转换处理的数据量来提高速度。将数据拆分为多个表,可以对UDF数据执行一些聚合和其他统计分析,然后通过外键将该结果连接到主表以获取非聚合属性。


  • 您可以使用表/列名称,
    反映数据实际上是什么。


  • 控制使用数据类型,
    检查约束,默认值等。
    定义数据域。不要低估由于即时数据类型转换造成的性能影响。这样的
    约束也帮助RDBMS查询
    优化器开发更有效的
    计划。


  • 如果你需要使用外部
    键,内置声明性
    参考
    完整性很少由
    基于触发器或应用程序级别
    约束实施执行。




  • 缺点:




    1. 很多表。
      执行模式分离和/或
      命名约定将减轻
      这一点。


    2. 有更多的应用程序代码
      需要操作UDF定义
      和管理。我期望这是
      仍然少于代码所需的
      原始选项1,3和& 4。




    其他注意事项:




    1. 如果有关于数据的
      性质的任何事情,会使
      对UDF进行分组,那么应该鼓励
      。这样,
      这些数据元素可以组合
      到单个表中。例如,
      让我们说你有UDF的颜色,
      的大小和成本。
      数据中的趋势是该
      数据的大多数实例看起来像

       'red' 'large',45.03 

      而不是

        NULL,'medium',NULL 

      ,你不会导致
      明显的速度惩罚
      结合1列表中的3列
      ,因为很少的值将是NULL和
      你避免再做2个表,
      当需要访问所有3列时,
      所需的连接数减少了2个。


    2. 如果你从
      经常使用的UDF和经常使用的
      ,那么应该将
      考虑包括在
      主表中。


    3. p>逻辑表设计可以带你到
      a某一点,但是当记录
      的计数真的很大时,你也
      应该开始查看什么表
      分区选项提供您的RDBMS选择。



    My requirements are:

    • Need to be able to dynamically add User-Defined fields of any data type
    • Need to be able to query UDFs quickly
    • Need to be able to do calculations on UDFs based on datatype
    • Need to be able to sort UDFs based on datatype

    Other Information:

    • I'm looking for performance primarily
    • There are a few million Master records which can have UDF data attached
    • When I last checked, there were over 50mil UDF records in our current database
    • Most of the time, a UDF is only attached to a few thousand of the Master records, not all of them
    • UDFs are not joined or used as keys. They're just data used for queries or reports

    Options:

    1. Create a big table with StringValue1, StringValue2... IntValue1, IntValue2,... etc. I hate this idea, but will consider it if someone can tell me it is better than other ideas and why.

    2. Create a dynamic table which adds a new column on demand as needed. I also don't like this idea since I feel performance would be slow unless you indexed every column.

    3. Create a single table containing UDFName, UDFDataType, and Value. When a new UDF gets added, generate a View which pulls just that data and parses it into whatever type is specified. Items which don't meet the parsing criteria return NULL.

    4. Create multiple UDF tables, one per data type. So we'd have tables for UDFStrings, UDFDates, etc. Probably would do the same as #2 and auto-generate a View anytime a new field gets added

    5. XML DataTypes? I haven't worked with these before but have seen them mentioned. Not sure if they'd give me the results I want, especially with performance.

    6. Something else?

    解决方案

    If performance is the primary concern, I would go with #6... a table per UDF (really, this is a variant of #2). This answer is specifically tailored to this situation and the description of the data distribution and access patterns described.

    Pros:

    1. Because you indicate that some UDFs have values for a small portion of the overall data set, a separate table would give you the best performance because that table will be only as large as it needs to be to support the UDF. The same holds true for the related indices.

    2. You also get a speed boost by limiting the amount of data that has to be processed for aggregations or other transformations. Splitting the data out into multiple tables lets you perform some of the aggregating and other statistical analysis on the UDF data, then join that result to the master table via foreign key to get the non-aggregated attributes.

    3. You can use table/column names that reflect what the data actually is.

    4. You have complete control to use data types, check constraints, default values, etc. to define the data domains. Don't underestimate the performance hit resulting from on-the-fly data type conversion. Such constraints also help RDBMS query optimizers develop more effective plans.

    5. Should you ever need to use foreign keys, built-in declarative referential integrity is rarely out-performed by trigger-based or application level constraint enforcement.

    Cons:

    1. This could create a lot of tables. Enforcing schema separation and/or a naming convention would alleviate this.

    2. There is more application code needed to operate the UDF definition and management. I expect this is still less code needed than for the original options 1, 3, & 4.

    Other Considerations:

    1. If there is anything about the nature of the data that would make sense for the UDFs to be grouped, that should be encouraged. That way, those data elements can be combined into a single table. For example, let's say you have UDFs for color, size, and cost. The tendency in the data is that most instances of this data looks like

       'red', 'large', 45.03 
      

      rather than

       NULL, 'medium', NULL
      

      In such a case, you won't incur a noticeable speed penalty by combining the 3 columns in 1 table because few values would be NULL and you avoid making 2 more tables, which is 2 fewer joins needed when you need to access all 3 columns.

    2. If you hit a performance wall from a UDF that is heavily populated and frequently used, then that should be considered for inclusion in the master table.

    3. Logical table design can take you to a certain point, but when the record counts get truly massive, you also should start looking at what table partitioning options are provided by your RDBMS of choice.

    这篇关于如何为用户定义的字段设计数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆