在数据库列中存储分隔列表真的那么糟糕吗? [英] Is storing a delimited list in a database column really that bad?

查看:25
本文介绍了在数据库列中存储分隔列表真的那么糟糕吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一个带有一组复选框的网络表单(可以选择任何一个或全部).我选择将它们保存在以逗号分隔的值列表中,这些值存储在数据库表的一列中.

Imagine a web form with a set of check boxes (any or all of them can be selected). I chose to save them in a comma separated list of values stored in one column of the database table.

现在,我知道正确的解决方案是创建第二个表并正确规范化数据库.实施简单的解决方案更快,而且我希望快速获得该应用程序的概念验证,而不必在上面花费太多时间.

Now, I know that the correct solution would be to create a second table and properly normalize the database. It was quicker to implement the easy solution, and I wanted to have a proof-of-concept of that application quickly and without having to spend too much time on it.

我认为在我的情况下,节省的时间和更简单的代码是值得的,这是一种防御性的设计选择,还是应该从一开始就将其标准化?

I thought the saved time and simpler code was worth it in my situation, is this a defensible design choice, or should I have normalized it from the start?

更多上下文,这是一个小型内部应用程序,它基本上替换了存储在共享文件夹中的 Excel 文件.我也在问,因为我正在考虑清理程序并使其更易于维护.里面有些东西我并不完全满意,其中之一就是这个问题的主题.

Some more context, this is a small internal application that essentially replaces an Excel file that was stored on a shared folder. I'm also asking because I'm thinking about cleaning up the program and make it more maintainable. There are some things in there I'm not entirely happy with, one of them is the topic of this question.

推荐答案

除了违反第一范式 因为在单列中存储了重复的一组值,逗号分隔的列表还有很多其他更实际的问题:

In addition to violating First Normal Form because of the repeating group of values stored in a single column, comma-separated lists have a lot of other more practical problems:

  • 无法确保每个值都是正确的数据类型:无法阻止1,2,3,banana,5
  • 不能使用外键约束将值链接到查找表;无法强制执行参照完整性.
  • 无法强制唯一性:无法阻止1,2,3,3,3,5
  • 无法在不获取整个列表的情况下从列表中删除值.
  • 不能存储超过字符串列长度的列表.
  • 难以搜索列表中具有给定值的所有实体;您必须使用低效的表扫描.可能不得不求助于正则表达式,例如在 MySQL 中:
    idlist REGEXP '[[:<:]]2[[:>:]]' 或在 MySQL 8.0 中:idlist REGEXP '\b2\b'
  • 难以统计列表中的元素,或进行其他聚合查询.
  • 很难将值加入到它们引用的查找表中.
  • 难以按排序顺序获取列表.
  • 很难选择一个保证不会出现在值中的分隔符
  • Can’t ensure that each value is the right data type: no way to prevent 1,2,3,banana,5
  • Can’t use foreign key constraints to link values to a lookup table; no way to enforce referential integrity.
  • Can’t enforce uniqueness: no way to prevent 1,2,3,3,3,5
  • Can’t delete a value from the list without fetching the whole list.
  • Can't store a list longer than what fits in the string column.
  • Hard to search for all entities with a given value in the list; you have to use an inefficient table-scan. May have to resort to regular expressions, for example in MySQL:
    idlist REGEXP '[[:<:]]2[[:>:]]' or in MySQL 8.0: idlist REGEXP '\b2\b'
  • Hard to count elements in the list, or do other aggregate queries.
  • Hard to join the values to the lookup table they reference.
  • Hard to fetch the list in sorted order.
  • Hard to choose a separator that is guaranteed not to appear in the values

要解决这些问题,您必须编写大量应用程序代码,重新发明 RDBMS 已经提供的功能更有效.

To solve these problems, you have to write tons of application code, reinventing functionality that the RDBMS already provides much more efficiently.

逗号分隔的列表是错误的,我把它作为我书中的第一章:SQL 反模式:避免数据库编程的陷阱.

Comma-separated lists are wrong enough that I made this the first chapter in my book: SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

有时您需要使用非规范化,但正如 @OMG Ponies 提到的,这些都是例外情况.任何非关系优化"都会使一种类型的查询受益,但会牺牲数据的其他用途,因此请确保您知道哪些查询需要得到如此特殊的处理以使其值得非规范化.

There are times when you need to employ denormalization, but as @OMG Ponies mentions, these are exception cases. Any non-relational "optimization" benefits one type of query at the expense of other uses of the data, so be sure you know which of your queries need to be treated so specially that they deserve denormalization.

这篇关于在数据库列中存储分隔列表真的那么糟糕吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆