选择不同的...内部联接与选择...其中(...)中的id [英] Select distinct ... inner join vs. select ... where id in (...)
问题描述
我正在尝试创建表的一个子集(作为实例化视图),定义为那些在另一个实例化视图中具有匹配记录的记录.
I'm trying to create a subset of a table (as a materialized view), defined as those records which have a matching record in another materialized view.
例如,假设我有一个包含user_id和name列的Users表,以及一个包含entry_id,user_id,activity和timestamp列的Log表.
For example, let's say I have a Users table with user_id and name columns, and a Log table, with entry_id, user_id, activity, and timestamp columns.
首先,我创建Log表的实例化视图,仅选择带有timestamp> some_date的行.现在,我希望在日志表的快照中引用用户的物料视图.我可以将其创建为
First I create a materialized view of the Log table, selecting only those rows with timestamp > some_date. Now I want a materliazed view of the Users referenced in my snapshot of the Log table. I can either create it as
select * from Users where user_id in (select user_id from Log_mview)
或者我可以
select distinct u.* from Users u inner join Log_mview l on u.user_id = l.user_id
(需要有区别,以避免来自具有多个日志条目的用户的多次点击).
(need the distinct to avoid multiple hits from users with multiple log entries).
前者看起来更干净,更优雅,但需要更长的时间.我想念什么吗?有更好的方法吗?
The former seems cleaner and more elegant, but takes much longer. Am I missing something? Is there a better way to do this?
where exists
子句有很大帮助,但条件使用OR
的情况除外.例如,假设上面的Log表也有一个user_name列,并且将Log条目与Users记录匹配的正确方法是当任一列(用户ID或用户名)匹配时.我发现了
The where exists
clause helped a lot, except in the case where the condition uses an OR
. For example, let's say the Log table above also had a user_name column, and the correct way to match a Log entry to a Users record is when either of the columns (user id or user name) match. I'm finding that
select distinct u.* from Users u
inner join Log_mview l
on u.user_id = l.user_id or u.name = l.user_name
比
select * from Users u where exists
(select id from Log_mview l
where l.user_id = u.user_id or l.user_name = u.name)
有帮助吗?
(关于解释计划... Lemme正在对其进行消毒,或者宁愿...我会在一段时间内发布它们.)
(Regarding the explain plan... Lemme work on sanitizing it, or them, rather... I'll post them in a while.)
说明计划: 对于具有内部联接的查询:
explain plans: For the query with inner join:
Plan hash value: 436698422
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4539K| 606M| | 637K (3)| 02:07:25 |
| 1 | HASH UNIQUE | | 4539K| 606M| 3201M| 637K (3)| 02:07:25 |
| 2 | CONCATENATION | | | | | | |
|* 3 | HASH JOIN | | 4206K| 561M| 33M| 181K (4)| 00:36:14 |
| 4 | BITMAP CONVERSION TO ROWIDS | | 926K| 22M| | 2279 (1)| 00:00:28 |
| 5 | BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4 | | | | | |
|* 6 | TABLE ACCESS FULL | USERS | 15M| 1630M| | 86638 (6)| 00:17:20 |
|* 7 | HASH JOIN | | 7646K| 1020M| 33M| 231K (4)| 00:46:13 |
| 8 | BITMAP CONVERSION TO ROWIDS | | 926K| 22M| | 2279 (1)| 00:00:28 |
| 9 | BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4 | | | | | |
| 10 | TABLE ACCESS FULL | USERS | 23M| 2515M| | 87546 (7)| 00:17:31 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("U"."NAME"="L"."USER_NAME")
6 - filter("U"."NAME" IS NOT NULL)
7 - access("U"."USER_ID"=TO_NUMBER("L"."USER_ID"))
filter(LNNVL("U"."NAME"="L"."USER_NAME") OR LNNVL("U"."NAME" IS NOT NULL))
Note
-----
- dynamic sampling used for this statement
对于使用where exists
的用户:
Plan hash value: 2786958565
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 114 | 21M (1)| 70:12:13 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | USERS | 23M| 2515M| 87681 (7)| 00:17:33 |
| 3 | BITMAP CONVERSION TO ROWIDS | | 7062 | 179K| 1 (0)| 00:00:01 |
|* 4 | BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4 | | | | |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ */ 0 FROM "MYSCHEMA"."LOG_MVIEW"
"LOG_MVIEW" WHERE ("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2) AND
("USER_NAME"=:B3 OR TO_NUMBER("USER_ID")=:B4) AND ("USER_NAME"=:B5 OR
TO_NUMBER("USER_ID")=:B6)))
4 - filter("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2)
Note
-----
- dynamic sampling used for this statement
更改数据库对象名称以保护无辜者. :p
DB object names changed to protect the innocent. :p
推荐答案
尝试一下
select * from Users u
where exists
( select user_id
from Log_mview l
where l.user_id = u.user_id )
/
如果子查询返回大量行,则WHERE EXISTS
可能比WHERE ... IN
快得多.
If the sub-query returns a large number of rows WHERE EXISTS
can be substantially faster than WHERE ... IN
.
这篇关于选择不同的...内部联接与选择...其中(...)中的id的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!