在关系数据库中存储和引用不可变的有序列表 [英] Storing and referencing an immutable ordered list in a relational database

查看:80
本文介绍了在关系数据库中存储和引用不可变的有序列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:我有一个数据库,其中包含父母孩子的名字(这是对实际数据的简化,但是类似



任务:数据库必须存储孩子名字的排序列表

假设:


  1. 该数据库将包含数百万个父母,甚至可能更多。

  2. 父母通常最多容纳4或5个孩子,但很少(甚至极端)。

  3. 孩子的名字(以及顺序)往往会重复很多。因此,父母应该引用一些 children_names_list_id 而不是保留真实姓名的副本。

  4. 孩子的名字以及他们对特定名字的订购父母是一成不变的。

  5. 插入新父母会非常频繁。在插入新的父母及其子女的列表时,如果数据库中已经存在这样的名称列表,则新的父母应引用现有的列表标识符。

  6. 有关姓名和姓名的查询它们的排序应该是可能的(例如-在将孩子命名为 Alice之后找到所有将孩子命名为 Bob的父母,或者找到所有将孩子命名为 Alice的父母,然后还有两个孩子,第三个叫卡罗尔(等)

问题:


  1. 存储此类列表的最佳方法是什么?该解决方案应该可靠并且支持快速插入父级。

  2. 父级应该如何引用列表?

当前(建议的)解决方案:



我目前的方法是要有一个映射孩子名字的表整数名称ID (名称长,整数短)。
然后将名称列表存储在以下元组中:< list_id> < order> < name_id> ,因此列表表如下:

 < list_id> < order> < name_id> 
1 1123
1 2345
1 3678
2 1901
3 1123
31901

示例表包含三个列表:[123,345,678],[901],[123,901],它们可能对应于以下内容:[ Alice , Bob, Carol],[ Dave],[ Alice, Dave]
然后,父表将具有 children_list_id 列引用 list_id 列。



此解决方案似乎很健壮,除了两个问题:


  1. 我不确定插入是否足够快(查找现有列表是否已存在似乎很慢),但可以采用其他方法

  2. 名称列表的键由 list_id 组成和 order 列; Parents表只需要引用 list_id ,它应该是外键,但是由于 list_id 不是键就列表表本身而言,还需要一个附加的列表表,其中 list_id 是键。这似乎很麻烦。

替代解决方案:



表的列表将在列中存储隐式排序:

 < list_id> < name_1> < name_2> < name_3> < name_4> ...< name_100> 
1111222333空
2444空
3555111空

在此表中, list_id 将为主键。



parents表将保留 list_id 作为外键。



该解决方案的健壮性有所降低(我创建多少列?10还是20? 50?),但插入速度更快。并且由于 list_id 是键,因此不需要其他表。但是,可能的缺点是某些查询变得更复杂,因为它们必须引用多个列。



谢谢!

解决方案

列表表是过度设计的。只是有一个 Parents 表,一个 Names 表和一个 ParentChildren 表格。 ParentChildren 表与您的列表表一样,除了一些细节。看起来像这样:

 < ParentId> <订购> < NameId> 
1 1123
1 2345
1 3678
2 1901
3 1123
31901

我认为在存储独立列表方面没有什么特别的节省。只需为每个父母存放孩子。


Background: I have a database containing parents and children's names (this is a simplification of the actual data, but the analogy is close enough).

Task: The database has to store an ordered list of children's names for each parent.

Assumptions:

  1. The database will contain millions of parents, possibly more.
  2. Parents typically have no more than to 4 or 5 children, but rare (and even extreme) cases have to be supported as well.
  3. Children's names (as well as the ordering) tends to repeat a lot. So parents should reference some children_names_list_id instead of keeping a copy of the actual names.
  4. Children's names as well as their ordering for a particular parent are immutable.
  5. Insertion of new parents will be very frequent. When a new parent is inserted with a list of his children, if such a list of names already exists in the database, the new parent should reference the existing list identifier.
  6. Queries about names and their ordering should be possible (for example - find all parents that names a child "Bob" after naming a child "Alice", or find all parents that named a child "Alice", then had two more children, with the third named "Carol" etc)

Questions:

  1. What's the best way to store such lists? The solution should be robust and support fast parent insertions.
  2. How should a parent reference the lists?

Current (proposed) solution:

My current approach is to have a table that maps children names to integer name ids (names are long, integers are short). Then store name lists in the following tuples: <list_id> <order> <name_id> so the list table will look like this:

<list_id> <order> <name_id>
    1       1       123
    1       2       345
    1       3       678
    2       1       901
    3       1       123
    3       1       901

The example table contains three lists: [123,345,678], [901], [123,901] which might correspond to something like: ["Alice", "Bob", "Carol"], ["Dave"], ["Alice", "Dave"] The parents table will then have a children_list_id column that references the list_id column.

This solution seems to be robust, except for two issues:

  1. I'm not sure whether insertions will be fast enough (looking up whether an existing list already exists seems like it could be slow), but other approaches seem to be less robust or (much) harder to query.
  2. The key to the name list table is composed of both the list_id and the order columns; the parents table has to reference only the list_id which should be a foreign key, but since list_id isn't a key by itself in the list table, an additional table of lists, in which list_id is key is needed. This seems cumbersome.

Alternate solution:

The table of lists will store implicit ordering in the columns:

<list_id> <name_1> <name_2> <name_3> <name_4> ... <name_100>
    1        111     222     333      null
    2        444     null
    3        555     111     null

In this table, list_id will be the primary key.

The parents table will keep the list_id as a foreign key.

This solution is somewhat less robust (how many columns do I create? 10? 20? 50?), but makes insertions much quicker. And since the list_id is a key, no additional tables are needed. A possible downside however is that some queries become much more complicated since they have to reference multiple columns.

Thanks!

解决方案

The list table is over-design. Just have a Parents table, a Names table and a ParentChildren table. The ParentChildren table is just like your list table, except for a few details. It would look like:

<ParentId> <Order> <NameId>
    1         1     123
    1         2     345
    1         3     678
    2         1     901
    3         1     123
    3         1     901

I don't see a particular savings to storing independent lists. Just store the children for each parent.

这篇关于在关系数据库中存储和引用不可变的有序列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆