使用“IN”的不同结果查询和“加入”询问 [英] Different result using "IN" query and "JOIN" query

查看:59
本文介绍了使用“IN”的不同结果查询和“加入”询问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两张桌子t1和t2。

从t1: -

总记录sel_col-distinct

28564502 12257389


从t2: -

总记录sel_col-distinct

11217091 10890077



我正在执行以下查询: -

I've two tables t1 and t2.
From t1:-
total records sel_col-distinct
28564502 12257389

From t2:-
total records sel_col-distinct
11217091 10890077

I am executing the below queries:-

select count( distinct t1.sel_col ) distinct, count(sel_col) total
from 
t1(nolock) join t2(nolock)
on t1.sel_col = CONVERT(VARBINARY(max),t2.sel_col,2)

distinct total

10600479 25666519

时间:-00:50分钟



distinct total
10600479 25666519
time:-00:50 mins

SELECT count(distinct t1.sel_col) total, count(t1.sel_col) total FROM 
t1(nolock)
where t1.sel_col in 
(select CONVERT(VARBINARY(max),t2.sel_col,2) from t2(nolock))

不同总额

时间: -12:54分钟



明显的价值相同,但总数之间存在差异。第二个查询花费了大量时间。有没有办法得到理想的结果?另外为什么总计数有这么大差异?



提前谢谢

distinct total
10600479 24835271
time:-12:54mins

The distinct value is coming same, but there is difference between total. The 2nd query is taking huge time. Is there any way to get the desired result? Also why there is so much difference in the total count?

Thanks in advance

推荐答案

这是因为连接具有乘以行的能力。请看下表:



It's because the join has the ability to multiply rows. Look at the following tables:

TableA
Cust    Value 
1        1
1        2
2        1

TableB
Cust
1
1
2





当你加入这两个你得到





When you join the two you get

a.Cust b.Cust Value
1      1      1
1      1      1
1      1      2
1      1      2
2      2      1





因此,如果你计算这些值,你会得到5在哪里使用你获得与TableA相同的数据集,所以计数是3.



如果你运行这个SQL:





So if you count these values you get 5 where are using the in you get that same dataset as TableA so the count is 3.

If you run this SQL:

SELECT SUM(COUNT(*) - 1) diff, COUNT(*) total, sel_col FROM t2 GROUP BY sel_col HAVING COUNT(*) > 1





如果这返回任何结果,则意味着您的计数正在成倍增加通过加入。结果中的diff列将等于两个结果之间的差异。



如果您需要具有与in相同结果的连接,请尝试以下操作。 />




If this returns any results it means your count is being multiplied by the join. The diff column in the result will equal the difference between the two results.

If you need a join with the same result as the in, try the following.

SELECT COUNT(t1.sel_col) FROM t1 (NOLOCK) INNER JOIN (SELECT DISTINCT sel_col FROM t2 (NOLOCK)) t2





加入子查询何时子查询是不同的将减少重复和连接的乘法效应。



Joining to the sub query when the sub query is distinct will reduce duplicates and the multiplying effect of the join.


他们采取不同时间的原因是第二个使用子查询,这是慢的。比较varbinary(MAX)听起来就像是在比较文件,这总是很慢。考虑在varbinary中存储数据的哈希值,以便更快地进行比较。然后你可以在哈希匹配的地方进行更深入的比较。



你正在抛出DISTINCT关键字。这通常意味着您的SQL存在其他隐藏的问题。



你是说第二个结果是正确的吗?什么是桌子结构?



好​​的,我现在看到了。你把总数放在那里。你的第一个选择是相同的,它是一个独特的。第二,不是。如果在t2中两次具有相同的值,则内部联接将在结果中创建该行的2个副本,即使它在t1中一次也是如此。执行子查询不是连接,因此不会创建重复的行。我想,这就是你使用DISTINCT的原因。相反,您应该修复数据库结构。
The reason they take different times is that the second uses a subquery, which is slow. Comparing varbinary(MAX) sounds like you're comparing files, which is always slow. Consider storing a hash of the data in your varbinary, for faster comparisons. You can then do a deeper comparison where the hashes match.

You are throwing the DISTINCT keyword about a bit. This often means that your SQL has other issues which you are hiding.

Are you saying the second result is correct ? What is the table structure ?

OK, I see now. You have put the totals there. Your first select is the same, it's a distinct. The second, is not. If you have the same value twice in t2, an inner join will create 2 copies of that row in your results, even if it's in t1, once. Doing a subquery, is not a join, so it would not create duplicate rows. This is, I assume, why you're using DISTINCT. Instead, you should fix your database structure.


这篇关于使用“IN”的不同结果查询和“加入”询问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆