用户定义变量的MySQL子查询 [英] MySQL Subquery with User-Defined Variables
问题描述
我正在尝试使用需要通过变量传递日期参考的子查询来完成一个需要计算列的查询.我不确定我是否做对了",但实际上查询永远不会完成,并且最终会旋转几分钟.这是我的查询:
I'm trying to accomplish a query that requires a calculated column using a subquery that passes the date reference via a variable. I'm not sure if I'm not "doing it right" but essentially the query never finishes and spins for minutes on end. This is my query:
select @groupdate:=date_format(order_date,'%Y-%m'), count(distinct customer_email) as num_cust,
(
select count(distinct cev.customer_email) as num_prev
from _pj_cust_email_view cev
inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @groupdate) and (cev.customer_email=prev_purch.customer_email)
where cev.order_date > @groupdate
) as prev_cust_count
from _pj_cust_email_view
group by @groupdate;
子查询具有一个inner join
完成自我联接,该自我联接仅使我获得在@groupdate
中的日期之前购买的人数. EXPLAIN
如下:
Subquery has an inner join
accomplishes the self-join that only gives me the count of people that have purchased prior to the date in @groupdate
. The EXPLAIN
is below:
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| 1 | PRIMARY | _pj_cust_email_view | ALL | NULL | NULL | NULL | NULL | 140147 | Using temporary; Using filesort |
| 2 | UNCACHEABLE SUBQUERY | cev | ALL | IDX_EMAIL | NULL | NULL | NULL | 140147 | Using where |
| 2 | UNCACHEABLE SUBQUERY | prev_purch | ref | IDX_EMAIL | IDX_EMAIL | 768 | cart_A.cev.customer_email | 1 | Using where |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
表_pj_cust_email_view
的结构如下:
'_pj_cust_email_view', 'CREATE TABLE `_pj_cust_email_view` (
`order_date` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
`customer_email` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
KEY `IDX_EMAIL` (`customer_email`),
KEY `IDX_ORDERDATE` (`order_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
再次,正如我之前说的,我真的不确定这是否是实现此目标的最佳方法.任何批评,指导都值得赞赏!
Again, as I said earlier, I'm not really sure that this is the best way to accomplish this. Any criticism, direction is appreciated!
更新
我已经取得了一些进展,现在我通过迭代所有已知月份而不是数据库中的几个月,并提前设置变量来在程序上进行上述操作.我还是不喜欢这样这就是我现在得到的:
I've made a little progress, and I'm now doing the above procedurally by iterating through all known months instead of months in the database and setting the vars ahead of time. I don't like this still. This is what I've got now:
设置用户定义的变量
set @startdate:='2010-08', @enddate:='2010-09';
获取给定范围内的不同电子邮件总数
select count(distinct customer_email) as num_cust
from _pj_cust_email_view
where order_date between @startdate and @enddate;
获取在给定范围之前购买的客户总数
select count(distinct cev.customer_email) as num_prev
from _pj_cust_email_view cev
inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @startdate) and (cev.customer_email=prev_purch.customer_email)
where cev.order_date between @startdate and @enddate;
其中@startdate
设置为月初,而@enddate
表示该月范围的结束.
Where @startdate
is set to the start of the month and @enddate
signifies the end of that month's range.
我真的觉得仍然可以通过一个完整的查询来完成.
I really feel like this still can be done in one full query.
推荐答案
我认为您根本不需要使用子查询,也不需要迭代数月.
I don't think you need to use subqueries at all, nor do you need to iterate over months.
相反,我建议您创建一个表来存储所有月份.即使您使用100个月的时间预填充它,它中也只能包含1200行,这是微不足道的.
Instead, I recommend you create a table to store all months. Even if you prepopulate it with 100 years of months, it would only have 1200 rows in it, which is trivial.
CREATE TABLE Months (
start_date DATE,
end_date DATE,
PRIMARY KEY (start_date, end_date)
);
INSERT INTO Months (start_date, end_date)
VALUES ('2011-03-01', '2011-03-31');
存储实际的开始日期和结束日期,因此您可以使用DATE数据类型并正确索引两列.
Store the actual start and end dates, so you can use the DATE data type and index the two columns properly.
我想我对您的要求有所了解,并且我已经整理了此答案.以下查询可能适合您:
edit: I think I understand your requirement a bit better, and I've cleaned up this answer. The following query may be right for you:
SELECT DATE_FORMAT(m.start_date, '%Y-%m') AS month,
COUNT(DISTINCT cev.customer_email) AS current,
GROUP_CONCAT(DISTINCT cev.customer_email) AS current_email,
COUNT(DISTINCT prev.customer_email) AS earlier,
GROUP_CONCAT(DISTINCT prev.customer_email) AS earlier_email
FROM Months AS m
LEFT OUTER JOIN _pj_cust_email_view AS cev
ON cev.order_date BETWEEN m.start_date AND m.end_date
INNER JOIN Months AS mprev
ON mprev.start_date <= m.start_date
LEFT OUTER JOIN _pj_cust_email_view AS prev
ON prev.order_date BETWEEN mprev.start_date AND mprev.end_date
GROUP BY month;
如果在表中创建以下复合索引:
If you create the following compound index in your table:
CREATE INDEX order_email on _pj_cust_email_view (order_date, customer_email);
然后,该查询最有可能成为仅索引查询,并且运行速度更快.
Then the query has the best chance of being an index-only query, and will run a lot faster.
下面是该查询的EXPLAIN优化报告.注意每个表的type: index
.
Below is the EXPLAIN optimization report from this query. Note type: index
for each table.
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: m
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 6
ref: NULL
rows: 4
Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: mprev
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 6
ref: NULL
rows: 4
Extra: Using where; Using index; Using join buffer
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: cev
type: index
possible_keys: order_email
key: order_email
key_len: 17
ref: NULL
rows: 10
Extra: Using index
*************************** 4. row ***************************
id: 1
select_type: SIMPLE
table: prev
type: index
possible_keys: order_email
key: order_email
key_len: 17
ref: NULL
rows: 10
Extra: Using index
以下是一些测试数据:
INSERT INTO Months (start_date, end_date) VALUES
('2011-03-01', '2011-03-31'),
('2011-02-01', '2011-02-28'),
('2011-01-01', '2011-01-31'),
('2010-12-01', '2010-12-31');
INSERT INTO _pj_cust_email_view VALUES
('ron', '2011-03-10'),
('hermione', '2011-03-15'),
('hermione', '2011-02-15'),
('hermione', '2011-01-15'),
('hermione', '2010-12-15'),
('neville', '2011-01-10'),
('harry', '2011-03-19'),
('harry', '2011-02-10'),
('molly', '2011-03-25'),
('molly', '2011-01-10');
这是给定数据的结果,包括电子邮件的并置列表,以便于查看.
Here's the result given that data, including the concatenated list of emails to make it easier to see.
+---------+---------+--------------------------+---------+----------------------------------+
| month | current | current_email | earlier | earlier_email |
+---------+---------+--------------------------+---------+----------------------------------+
| 2010-12 | 1 | hermione | 1 | hermione |
| 2011-01 | 3 | neville,hermione,molly | 3 | hermione,molly,neville |
| 2011-02 | 2 | hermione,harry | 4 | harry,hermione,molly,neville |
| 2011-03 | 4 | molly,ron,harry,hermione | 5 | molly,ron,hermione,neville,harry |
+---------+---------+--------------------------+---------+----------------------------------+
这篇关于用户定义变量的MySQL子查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!