SQL帮助:如何从这个查询的总和是不同的求和查询? [英] SQL Help: How come the total from this query is different that a summation query?
问题描述
此查询按lead_source_id执行分组操作:
This query does a group by on lead_source_id:
SELECT ch.lead_source_id,
Count(DISTINCT ch.repurchased_date)
FROM customers_history ch
WHERE ch.repurchased_date >= '2014-04-01'
AND ch.repurchased_date < '2014-05-01'
AND ch.lead_source_id IS NOT NULL
GROUP BY ch.lead_source_id;
此查询汇总表中的记录:
And this query totals the records in the table:
SELECT Count(DISTINCT( repurchased_date ))
FROM customers_history
INNER JOIN (SELECT DISTINCT( customer_id ) AS xcid
FROM customers_history
WHERE repurchased_date >= '2014-04-01'
AND repurchased_date < '2014-05-01'
AND lead_source_id IS NOT NULL) AS Temp
ON Temp.xcid = customer_id
WHERE repurchased_date >= '2014-04-01'
AND repurchased_date < '2014-05-01'
AND lead_source_id IS NOT NULL;
在我们的生产数据中,Query1的总计达到7963,但第二个查询打印7905。
On our production data, the totals from Query1 come to 7963, but the second query prints 7905. Why the difference and how can we fix our queries?
这是我们的表格布局:
+--------+-------------+----------------+---------------------+--------+
| id | customer_id | lead_source_id | repurchased_date | Rating |
+--------+-------------+----------------+---------------------+--------+
| 422923 | 420450 | 4 | 2014-04-14 09:16:48 | Warm |
| 422924 | 420450 | 4 | 2014-04-14 09:16:48 | Cold |
| 422956 | 420450 | 4 | 2014-04-14 09:16:49 | Hot |
| 422933 | 420451 | 37 | 2014-04-14 09:18:41 | Hot |
| 422938 | 420452 | 1 | 2014-04-10 20:50:30 | Hot |
| 422984 | 420452 | 1 | 2014-04-12 20:50:30 | Warm |
| 422940 | 420453 | 47 | 2014-04-14 09:20:27 | Hot |
+--------+-------------+----------------+---------------------+--------+
EDIT
回答一些关于null的可能性:
EDIT
To answer some of the possibilities about nulls:
select count(id)from customers_history where customer_id is null
:0
从customers_history选择计数lead_source_id is null
:5103
从customers_history选择count(id),其中repurchased_date为null
:0
推荐答案
最明显的结论是一些 lead_source_id
The most obvious conclusion is that some lead_source_id
s share values of repurchased_date
.
另一个可能性是你有 NULL $ c
值
customer_id
,第二个过滤这些。
Another possibility is that you have NULL
values for customer_id
and the second filters these out.
第三种可能是 NULL
的值 lead_source_id
在第一个查询中添加附加值。
The third possibility is that NULL
values of lead_source_id
are adding additional values in the first query.
这篇关于SQL帮助:如何从这个查询的总和是不同的求和查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!