我们如何处理快速增长非常大的交叉表？ [英] How can we deal with intersection tables that quickly grow very large?

查看：115 发布时间：2017/3/17 21:57:03 database database-design

本文介绍了我们如何处理快速增长非常大的交叉表？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

例如，我们有具有多对多关系的表A和表B.交叉表，表C存储A.id和B.id以及表示两者之间的关系的值。或者作为一个具体的例子，想象stackexchange有一个用户帐户，一个论坛和一个karma分数。或者，一个学生，一个课程和一个年级。如果表A和B非常大，表C可以并且可能会非常快速地增长（实际上让我们只是假设它）。我们如何处理这样的问题？有没有更好的方法来设计表，以避免这种情况？

For example, we have table A, and table B which have a many-to-many relationship. An intersection table, Table C stores A.id and B.id along with a value that represents a relationship between the two. Or as a concrete example, imagine stackexchange which has a user account, a forum, and a karma score. Or, a student, a course, and a grade. If table A and B are very large, table C can and probably will grow monstrously large very quickly(in fact lets just assume it does). How do we go about dealing with such an issue? Is there a better way to design the tables to avoid this?

推荐答案

如果一些行被连接而另一些没有被连接，则该信息必须以某种方式表示，并且关系方式是连接（也称为链接）表。是的，一个连接表可以增长很大，但幸运的是数据库能够处理大量的数据。

There is no magic. If some rows are connected and some aren't, this information has to be represented somehow, and the "relational" way of doing it is a "junction" (aka "link") table. Yes, a junction table can grow large, but fortunately databases are very capable of handling huge amounts of data.

使用连接表和逗号分隔列表有很好的理由（或类似），包括：

There are good reasons for using junction table versus comma-separated list (or similar), including:

高效查询（通过索引和聚类）。

设计连接表时，请询问以下问题：

When designing a junction table, ask the following questions:

我需要只在一个方向查询吗？¹
- strong>一个方向，只需在两个外键上创建一个组合PRIMARY KEY（我们称之为PARENT_ID和CHILD_ID）。订单重要事项：如果您从家长查询儿童，则PK应为：{PARENT_ID，CHILD_ID}。
- 如果两个方向

Do I need to query in only one direction or both?¹
- If one direction, just create a composite PRIMARY KEY on both foreign keys (let's call them PARENT_ID and CHILD_ID). Order matters: if you query from parent to children, PK should be: {PARENT_ID, CHILD_ID}.
- If both directions, also create a composite index in the opposite order, which is {CHILD_ID, PARENT_ID} in this case.

如果是，集群表和 2

我没有，在第二个索引中添加额外的数据。不会聚集表格，并且不覆盖辅助索引中的额外数据。³

If yes, cluster the table and cover the extra data in the secondary index as necessary.²
I no, don't cluster the table and don't cover the extra data in the secondary index.³

如果是，请考虑添加代理键是否值得保持子FK细长。但请注意，如果您添加了代理键，则可能会消除群集的机会。

If yes, consider whether adding a surrogate key might be worthwhile to keep child FKs slim. But beware that if you add a surrogate key, this will probably eliminate the opportunity for clustering.

$ b b

在许多情况下，这些问题的答案将是：两者，是和否，在这种情况下，您的表格将类似于此（下面的Oracle语法）：

In many cases, answers to these questions will be: both, yes and no, in which case your table will look similar to this (Oracle syntax below):

CREATE TABLE JUNCTION_TABLE (
    PARENT_ID INT,
    CHILD_ID INT,
    EXTRA_DATA VARCHAR2(50),
    PRIMARY KEY (PARENT_ID, CHILD_ID),
    FOREIGN KEY (PARENT_ID) REFERENCES PARENT_TABLE (PARENT_ID),
    FOREIGN KEY (CHILD_ID) REFERENCES CHILD_TABLE (CHILD_ID)
) ORGANIZATION INDEX COMPRESS;

CREATE UNIQUE INDEX JUNCTION_TABLE_IE1 ON
    JUNCTION_TABLE (CHILD_ID, PARENT_ID, EXTRA_DATA) COMPRESS;

注意事项：

ORGANIZATION INDEX ：大多数DBMS调用群集的Oracle特定语法。

COMPRESS ：某些DBMS（其他DBMS）有自己的语法，有些（MySQL / InnoDB）意味着群集，用户无法关闭。支持前沿索引压缩。由于聚簇表本质上是一个索引，因此也可以应用压缩。

JUNCTION_TABLE_IE1 ， EXTRA_DATA ：由于辅助索引覆盖了额外的数据，因此当从子项到父项的方向查询时，DBMS可以获得它而不触及表。

ORGANIZATION INDEX: Oracle-specific syntax for what most DBMSes call clustering. Other DBMSes have their own syntax and some (MySQL/InnoDB) imply clustering and user cannot turn it off.
COMPRESS: Some DBMSes support leading-edge index compression. Since clustered table is essentially an index, compression can be applied to it as well.
JUNCTION_TABLE_IE1, EXTRA_DATA: Since extra data is covered by the secondary index, DBMS can get it without touching the table when querying in the direction from child to parents. Primary key acts as a clustering key so the extra data is naturally covered when querying from a parent to the children.

物理上，你有一个只有两个B树（一个是聚集表，另一个是次要索引），没有表堆。这转换为良好的查询性能（通过简单的索引范围扫描可以满足父子对子和子对父的方向）以及插入/删除行时相当小的开销。

Physically, you have just two B-Trees (one is the clustered table and the other is the secondary index) and no table heap at all. This translates to good querying performance (both parent-to-child and child-to-parent directions can be satisfied by a simple index range scan) and fairly small overhead when inserting/deleting rows.

以下是等效的MS SQL Server语法（无索引压缩）：

Here is the equivalent MS SQL Server syntax (sans index compression):

CREATE TABLE JUNCTION_TABLE (
    PARENT_ID INT,
    CHILD_ID INT,
    EXTRA_DATA VARCHAR(50),
    PRIMARY KEY (PARENT_ID, CHILD_ID),
    FOREIGN KEY (PARENT_ID) REFERENCES PARENT_TABLE (PARENT_ID),
    FOREIGN KEY (CHILD_ID) REFERENCES CHILD_TABLE (CHILD_ID)
);

CREATE UNIQUE INDEX JUNCTION_TABLE_IE1 ON
    JUNCTION_TABLE (CHILD_ID, PARENT_ID) INCLUDE (EXTRA_DATA);

¹换句话说，

¹ In other words, do you only need to get "children" of given "parent", or you might also need to get parents of given child.

。 sup> 2 覆盖允许仅通过索引满足查询，并避免昂贵的双查找，否则在通过集群表中的辅助索引访问数据时需要双查找。

² Covering allows the query to be satisfied from the index alone, and avoids expensive double-lookup that would otherwise be necessary when accessing data through a secondary index in the clustered table.

³这样，额外的数据不会重复（这将是昂贵的，因为它很大），但你避免双重查找和替换它（更便宜）表堆访问。但是，请注意可能会破坏性能的聚类因素基于堆的表中的范围扫描！

³ This way, the extra data is not repeated (which would be expensive, since it's big), yet you avoid the double-lookup and replace it with (cheaper) table heap access. But, beware of clustering factor that can destroy the performance of range scans in heap-based tables!

这篇关于我们如何处理快速增长非常大的交叉表？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我们如何处理快速增长非常大的交叉表？ [英] How can we deal with intersection tables that quickly grow very large?

问题描述

推荐答案

相关文章

其他数据库最新文章

热门教程

热门工具

登录关闭

我们如何处理快速增长非常大的交叉表？ [英] How can we deal with intersection tables that quickly grow very large?

问题描述

推荐答案

相关文章

其他数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭