在关系数据库中存储和引用不可变的有序列表 [英] Storing and referencing an immutable ordered list in a relational database
问题描述
背景:我有一个数据库,其中包含父母和孩子的名字(这是对实际数据的简化,但是类似
任务:数据库必须存储孩子名字的排序列表
假设:
- 该数据库将包含数百万个父母,甚至可能更多。
- 父母通常最多容纳4或5个孩子,但很少(甚至极端)。
- 孩子的名字(以及顺序)往往会重复很多。因此,父母应该引用一些
children_names_list_id
而不是保留真实姓名的副本。 - 孩子的名字以及他们对特定名字的订购父母是一成不变的。
- 插入新父母会非常频繁。在插入新的父母及其子女的列表时,如果数据库中已经存在这样的名称列表,则新的父母应引用现有的列表标识符。
- 有关姓名和姓名的查询它们的排序应该是可能的(例如-在将孩子命名为 Alice之后找到所有将孩子命名为 Bob的父母,或者找到所有将孩子命名为 Alice的父母,然后还有两个孩子,第三个叫卡罗尔(等)
问题:
- 存储此类列表的最佳方法是什么?该解决方案应该可靠并且支持快速插入父级。
- 父级应该如何引用列表?
当前(建议的)解决方案:
我目前的方法是要有一个映射孩子名字的表到整数名称ID (名称长,整数短)。
然后将名称列表存储在以下元组中:< list_id> < order> < name_id>
,因此列表表如下:
< list_id> < order> < name_id>
1 1123
1 2345
1 3678
2 1901
3 1123
31901
示例表包含三个列表:[123,345,678],[901],[123,901],它们可能对应于以下内容:[ Alice , Bob, Carol],[ Dave],[ Alice, Dave]
然后,父表将具有 children_list_id
列引用 list_id
列。
此解决方案似乎很健壮,除了两个问题:
- 我不确定插入是否足够快(查找现有列表是否已存在似乎很慢),但可以采用其他方法
- 名称列表的键由
list_id
组成和order
列; Parents表只需要引用list_id
,它应该是外键,但是由于list_id
不是键就列表表本身而言,还需要一个附加的列表表,其中list_id
是键。这似乎很麻烦。
替代解决方案:
表的列表将在列中存储隐式排序:
< list_id> < name_1> < name_2> < name_3> < name_4> ...< name_100>
1111222333空
2444空
3555111空
在此表中, list_id
将为主键。
parents表将保留 list_id
作为外键。
该解决方案的健壮性有所降低(我创建多少列?10还是20? 50?),但插入速度更快。并且由于 list_id
是键,因此不需要其他表。但是,可能的缺点是某些查询变得更复杂,因为它们必须引用多个列。
谢谢!
列表
表是过度设计的。只是有一个 Parents
表,一个 Names
表和一个 ParentChildren
表格。 ParentChildren
表与您的列表表一样,除了一些细节。看起来像这样:
< ParentId> <订购> < NameId>
1 1123
1 2345
1 3678
2 1901
3 1123
31901
我认为在存储独立列表方面没有什么特别的节省。只需为每个父母存放孩子。
Background: I have a database containing parents and children's names (this is a simplification of the actual data, but the analogy is close enough).
Task: The database has to store an ordered list of children's names for each parent.
Assumptions:
- The database will contain millions of parents, possibly more.
- Parents typically have no more than to 4 or 5 children, but rare (and even extreme) cases have to be supported as well.
- Children's names (as well as the ordering) tends to repeat a lot. So parents should reference some
children_names_list_id
instead of keeping a copy of the actual names. - Children's names as well as their ordering for a particular parent are immutable.
- Insertion of new parents will be very frequent. When a new parent is inserted with a list of his children, if such a list of names already exists in the database, the new parent should reference the existing list identifier.
- Queries about names and their ordering should be possible (for example - find all parents that names a child "Bob" after naming a child "Alice", or find all parents that named a child "Alice", then had two more children, with the third named "Carol" etc)
Questions:
- What's the best way to store such lists? The solution should be robust and support fast parent insertions.
- How should a parent reference the lists?
Current (proposed) solution:
My current approach is to have a table that maps children names to integer name ids (names are long, integers are short).
Then store name lists in the following tuples: <list_id> <order> <name_id>
so the list table will look like this:
<list_id> <order> <name_id>
1 1 123
1 2 345
1 3 678
2 1 901
3 1 123
3 1 901
The example table contains three lists: [123,345,678], [901], [123,901] which might correspond to something like: ["Alice", "Bob", "Carol"], ["Dave"], ["Alice", "Dave"]
The parents table will then have a children_list_id
column that references the list_id
column.
This solution seems to be robust, except for two issues:
- I'm not sure whether insertions will be fast enough (looking up whether an existing list already exists seems like it could be slow), but other approaches seem to be less robust or (much) harder to query.
- The key to the name list table is composed of both the
list_id
and theorder
columns; the parents table has to reference only thelist_id
which should be a foreign key, but sincelist_id
isn't a key by itself in the list table, an additional table of lists, in whichlist_id
is key is needed. This seems cumbersome.
Alternate solution:
The table of lists will store implicit ordering in the columns:
<list_id> <name_1> <name_2> <name_3> <name_4> ... <name_100>
1 111 222 333 null
2 444 null
3 555 111 null
In this table, list_id
will be the primary key.
The parents table will keep the list_id
as a foreign key.
This solution is somewhat less robust (how many columns do I create? 10? 20? 50?), but makes insertions much quicker. And since the list_id
is a key, no additional tables are needed. A possible downside however is that some queries become much more complicated since they have to reference multiple columns.
Thanks!
The list
table is over-design. Just have a Parents
table, a Names
table and a ParentChildren
table. The ParentChildren
table is just like your list table, except for a few details. It would look like:
<ParentId> <Order> <NameId>
1 1 123
1 2 345
1 3 678
2 1 901
3 1 123
3 1 901
I don't see a particular savings to storing independent lists. Just store the children for each parent.
这篇关于在关系数据库中存储和引用不可变的有序列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!