在数据库列中存储分隔列表真的那么糟吗? [英] Is storing a delimited list in a database column really that bad?

查看:125
本文介绍了在数据库列中存储分隔列表真的那么糟吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设想一个带有一组复选框的Web表单(可以选择任意或全部复选框)。我选择将它们保存在以逗号分隔的值列表中,这些值存储在数据库表的一列中。

Imagine a web form with a set of check boxes (any or all of them can be selected). I chose to save them in a comma separated list of values stored in one column of the database table.

现在,我知道正确的解决方案是创建第二个表并正确标准化数据库。实施简单的解决方案更快,我想快速地对该应用程序进行概念验证,而不必花费太多时间。

Now, I know that the correct solution would be to create a second table and properly normalize the database. It was quicker to implement the easy solution, and I wanted to have a proof-of-concept of that application quickly and without having to spend too much time on it.

我认为节省的时间和更简单的代码在我的情况下是值得的,这是一个可以防御的设计选择,还是应该从头开始规范化?

I thought the saved time and simpler code was worth it in my situation, is this a defensible design choice, or should I have normalized it from the start?

,这是一个小的内部应用程序,本质上替换存储在共享文件夹上的Excel文件。我也问,因为我正在考虑清理程序,使其更易于维护。有些事情我不太满意,其中一个是这个问题的主题。

Some more context, this is a small internal application that essentially replaces an Excel file that was stored on a shared folder. I'm also asking because I'm thinking about cleaning up the program and make it more maintainable. There are some things in there I'm not entirely happy with, one of them is the topic of this question.

推荐答案

除了违反第一正常表单,因为重复的值组存储在单个列中,逗号分隔列表还有很多其他更实际的问题:

In addition to violating First Normal Form because of the repeating group of values stored in a single column, comma-separated lists have a lot of other more practical problems:


  • 无法确保每个值都是正确的数据类型:无法防止 1,2,3,banana,5

  • 无法使用外键约束将值链接到查找表;

  • 无法强制执行唯一性:无法阻止 1,2,3,3,3,5

  • 无法在不提取整个列表的情况下从列表中删除值。

  • 无法存储长于列字符串的列表。 li>
  • 难以搜索列表中具有给定值的所有实体;你必须使用低效的表扫描。可能必须诉诸正则表达式,例如在MySQL中:

    idlist REGEXP'[[:<:]] 2 [[:>:]]'

  • 难以计数列表中的元素或执行其他聚合查询。

  • 难以将值连接到查找表

  • 将整数存储为字符串需要的空间大约是存储二进制整数的两倍。

  • Can’t ensure that each value is the right data type: no way to prevent 1,2,3,banana,5
  • Can’t use foreign key constraints to link values to a lookup table; no way to enforce referential integrity.
  • Can’t enforce uniqueness: no way to prevent 1,2,3,3,3,5
  • Can’t delete a value from the list without fetching the whole list.
  • Can't store a list longer than what fits in the string column.
  • Hard to search for all entities with a given value in the list; you have to use an inefficient table-scan. May have to resort to regular expressions, for example in MySQL:
    idlist REGEXP '[[:<:]]2[[:>:]]'
  • Hard to count elements in the list, or do other aggregate queries.
  • Hard to join the values to the lookup table they reference.
  • Hard to fetch the list in sorted order.
  • Storing integers as strings takes about twice as much space as storing binary integers. Not to mention the space taken by the comma characters.

要解决这些问题,您必须编写大量的应用程序代码,重新发挥功能

To solve these problems, you have to write tons of application code, reinventing functionality that the RDBMS already provides much more efficiently.

逗号分隔的列表是错误的,我把它作为我的书的第一章: SQL反模式:避免数据库编程的陷阱。

Comma-separated lists are wrong enough that I made this the first chapter in my book: SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

有时候,您需要采用非规范化,但作为 @OMG Ponies提及,这些是异常情况。任何非关系的优化都有利于一种类型的查询而牺牲了数据的其他用途,因此请确保您知道哪些查询需要特别处理,以至于它们应该进行反规范化。

There are times when you need to employ denormalization, but as @OMG Ponies mentions, these are exception cases. Any non-relational "optimization" benefits one type of query at the expense of other uses of the data, so be sure you know which of your queries need to be treated so specially that they deserve denormalization.

这篇关于在数据库列中存储分隔列表真的那么糟吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆