交叉表索引是否可行? [英] Is cross-table indexing possible?

查看:120
本文介绍了交叉表索引是否可行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一种结构,在这种结构中,您与两个表上的条件(where,order by等)具有多对一(或一对多)关系。例如:

Consider a structure where you have a many-to-one (or one-to-many) relationship with a condition (where, order by, etc.) on both tables. For example:

CREATE TABLE tableTwo (
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    eventTime DATETIME NOT NULL,
    INDEX (eventTime)
) ENGINE=InnoDB;

CREATE TABLE tableOne (
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    tableTwoId INT UNSIGNED NOT NULL,
    objectId INT UNSIGNED NOT NULL,
    INDEX (objectID),
    FOREIGN KEY (tableTwoId) REFERENCES tableTwo (id)
) ENGINE=InnoDB;

以及示例查询:

select * from tableOne t1 
  inner join tableTwo t2 on t1.tableTwoId = t2.id
  where objectId = '..'
  order by eventTime;

假设您索引 tableOne.objectId 并且 tableTwo.eventTime 。如果您然后解释上面的查询,它将显示使用filesort。本质上,它首先应用 tableOne.objectId 索引,但它不能应用 tableTwo.eventTime 索引,因为index是整个tableTwo(不是有限的结果集),因此它必须进行手动排序。

Let's say you index tableOne.objectId and tableTwo.eventTime. If you then explain on the above query, it will show "Using filesort". Essentially, it first applies the tableOne.objectId index, but it can't apply the tableTwo.eventTime index because that index is for the entirety of tableTwo (not the limited result set), and thus it must do a manual sort.

因此,是否有办法进行跨表索引,因此每次检索结果时都不需要文件排序?某事喜欢:

create index ind_t1oi_t2et on tableOne t1 
  inner join tableTwo t2 on t1.tableTwoId = t2.id 
  (t1.objectId, t2.eventTime);

此外,我已着手创建视图并将其编入索引,但视图不支持索引。

Also, I've looked into creating a view and indexing that, but indexing is not supported for views.

如果无法进行跨表索引,我一直倾向于使用的解决方案是在一个表中复制条件数据。在这种情况下,意味着 eventTime 将在 tableOne 中复制,并且将在<$上设置多列索引c $ c> tableOne.objectId 和 tableOne.eventTime (基本上是手动创建索引)。但是,我以为我会首先找出其他人的经验,看看这是不是最好的方式。

The solution I've been leaning towards if cross-table indexing isn't possible is replicating the conditional data in one table. In this case that means eventTime would be replicated in tableOne and a multi-column index would be set up on tableOne.objectId and tableOne.eventTime (essentially manually creating the index). However, I thought I'd seek out other people's experience first to see if that was the best way.

非常感谢!

更新:

以下是加载测试数据和比较结果的一些过程:

Here are some procedures for loading test data and comparing results:

drop procedure if exists populate_table_two;
delimiter #
create procedure populate_table_two(IN numRows int)
begin
declare v_counter int unsigned default 0;
  while v_counter < numRows do
    insert into tableTwo (eventTime) 
    values (CURRENT_TIMESTAMP - interval 0 + floor(0 + rand()*1000) minute);
    set v_counter=v_counter+1;
  end while;
end #
delimiter ;

drop procedure if exists populate_table_one;
delimiter #
create procedure populate_table_one
   (IN numRows int, IN maxTableTwoId int, IN maxObjectId int)
begin
declare v_counter int unsigned default 0;
  while v_counter < numRows do
    insert into tableOne (tableTwoId, objectId) 
      values (floor(1 +(rand() * maxTableTwoId)), 
              floor(1 +(rand() * maxObjectId)));
    set v_counter=v_counter+1;
  end while;
end #
delimiter ;

您可以按照以下方式在 tableTwo tableOne 中的20,000行(随机引用 tableOne 和随机 objectId 介于1和5之间),分别需要26.2秒和70.77秒来运行:

You can use these as follows to populate 10,000 rows in tableTwo and 20,000 rows in tableOne (with random references to tableOne and random objectIds between 1 and 5), which took 26.2 and 70.77 seconds respectively to run for me:

call populate_table_two(10000);
call populate_table_one(20000, 10000, 5);

更新2(经过测试的触发SQL):

下面是基于daniHp触发方法的久经考验的SQL。当 tableOne tableOne 时,这会使 dateTime 同步 tableOne 添加或 tableTwo 已更新。此外,如果将条件列复制到连接表,则此方法也应适用于多对多关系。在我测试 tableOne 中的300,000行和 tableTwo 中的200,000行时,具有类似限制的旧查询的速度是0.12秒,新查询的速度仍显示为0.00秒。因此,有一个明显的改进,这种方法应该在数百万行和更远的行中表现良好。

Below is the tried and tested SQL based on daniHp's triggering method. This keeps the dateTime in sync on tableOne when tableOne is added or tableTwo is updated. Also, this method should also work for many-to-many relationships if the condition columns are copied to the joining table. In my testing of 300,000 rows in tableOne and 200,000 rows in tableTwo, the speed of the old query with similar limits was 0.12 sec and the speed of the new query still shows as 0.00 seconds. Thus, there is a clear improvement, and this method should perform well into the millions of rows and farther.

alter table tableOne add column tableTwo_eventTime datetime;

create index ind_t1_oid_t2et on tableOne (objectId, tableTwo_eventTime);

drop TRIGGER if exists t1_copy_t2_eventTime;
delimiter #
CREATE TRIGGER t1_copy_t2_eventTime
   BEFORE INSERT ON tableOne
for each row
begin
  set NEW.tableTwo_eventTime = (select eventTime 
       from tableTwo t2
       where t2.id = NEW.tableTwoId);
end #
delimiter ;

drop TRIGGER if exists upd_t1_copy_t2_eventTime;
delimiter #
CREATE TRIGGER upd_t1_copy_t2_eventTime
   BEFORE UPDATE ON tableTwo
for each row
begin
  update tableOne 
    set tableTwo_eventTime = NEW.eventTime 
    where tableTwoId = NEW.id;
end #
delimiter ;

更新的查询:

select * from tableOne t1 
  inner join tableTwo t2 on t1.tableTwoId = t2.id
  where t1.objectId = 1
  order by t1.tableTwo_eventTime desc limit 0,10;


推荐答案

如您所知,SQLServer通过索引视图

As you know, SQLServer achieves this with indexed views:


索引视图提供额外的性能优势,使用标准索引无法实现
。索引视图可以通过以下方式提高查询
的性能:

indexed views provide additional performance benefits that cannot be achieved using standard indexes. Indexed views can increase query performance in the following ways:

可以预先计算聚合并将其存储在索引中,以最大限度地减少查询执行期间的
昂贵的计算。

Aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.

可以预先加入表格并存储结果数据集。

Tables can be prejoined and the resulting data set stored.

可以存储联接或聚合的组合。

Combinations of joins or aggregations can be stored.

在SQLServer中,要利用这种技术,必须查询视图而不是查询表。这意味着您应该了解视图和索引。

In SQLServer, to take advantage of this technique, you must query over the view and not over the tables. That means that you should know about the view and indexes.

MySQL没有索引视图,但您可以模拟表+触发器+索引的行为

MySQL does not have indexed views, but you can simulate the behavior with table + triggers + indexes.

您必须创建一个索引表,一个使数据表保持最新的触发器,而不是创建一个视图,然后您必须查询新的表而不是规范化表。

Instead of creating a view, you must create an indexed table, a trigger to keep the data table up to date, and then you must query your new table instead of your normalized tables.

您必须评估写操作的开销是否抵消了读操作的改进。

You must evaluate if the overhead of write operations offsets the improvement in read operations.

已编辑:

请注意,并非始终需要创建新表。例如,在1:N关系(主 - 详细信息)触发器中,您可以将主表中的字段副本保留到详细信息表中。在你的情况下:

Note that it is not always necessary to create a new table. For example, in a 1:N relationship (master-detail) trigger, you can keep a copy of a field from the 'master' table into the 'detail' table. In your case:

CREATE TABLE tableOne (
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    tableTwoId INT UNSIGNED NOT NULL,
    objectId INT UNSIGNED NOT NULL,
    desnormalized_eventTime DATETIME NOT NULL,
    INDEX (objectID),
    FOREIGN KEY (tableTwoId) REFERENCES tableTwo (id)
) ENGINE=InnoDB;

CREATE TRIGGER tableOne_desnormalized_eventTime
   BEFORE INSERT ON tableOne
for each row
begin
  DECLARE eventTime DATETIME;
  SET eventTime = 
      (select eventTime 
       from tableOne
       where tableOne.id = NEW.tableTwoId);
  NEW.desnormalized_eventTime = eventTime;
end;

请注意,这是一个插入触发器。

Notice that this is a before insert trigger.

现在,查询被重写如下:

Now, the query is rewritten as follows:

select * from tableOne t1 
  inner join tableTwo t2 on t1.tableTwoId = t2.id
  where t1.objectId = '..'
  order by t1.desnormalized_eventTime;

免责声明:未经测试。

这篇关于交叉表索引是否可行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆