将`rand()`与`having`一起使用 [英] Using `rand()` with `having`
问题描述
我有一个包含记录列表的表.每次迭代都必须从特定的偏移量开始,随机选择其中的一组.每行都有机会被选择(例如,新的或不常被选择的行会被选择更多).
I have a table that contains a list of records. Each iteration, a random set of these must be picked, starting from a specific offset. Each row has a chance to be picked (so e.g. new or not often picked rows are picked more).
但是,某事不起作用,导致使用别名rand()
返回不满足条件的行.
However, something doesn't work, causing rows to be returned that do not satisfy a condition using an aliased rand()
.
我正在尝试使用以下查询:
I'm attempting to use the following query:
select
id,
probability,
rand() rolledChance
from records
where id > :offset
having rolledChance < probability;
其中:offset
是准备好的语句参数,并且是该用户在上一次迭代中最后扫描的ID.
Where :offset
is a prepared statement parameter, and is the last scanned id in the last iteration for this user.
在这样创建的表上(这是表的相关子集):
On a table created like this (which is the relevant subset of the table):
CREATE TABLE records (id INT, probability FLOAT);
在表records
上,概率为0到1之间的值.但是,这将返回条件不满足的行.我通过以下查询对此进行了检查:
Where probability is a value between 0 and 1 on the table records
. However, this returns rows where the condition does not satisfy. I checked this with the following query:
select
*,
x.rolledChance < x.probability shouldPick
from
(select
id,
probability,
rand() rolledChance
from records
having rolledChance < probability
) x;
返回的几行是:
id probability rolledChance shouldPick
12 0.546358 0.015139976530466207 1
26 0.877424 0.9730734508233829 0
46 0.954425 0.35213605347288407 1
当我按如下所示重新使用第二个查询时,它会按预期工作,并且仅返回rolledChance
实际上低于probability
的行:
When I repurpose the second query as follows, it works as expected, and only returns rows where rolledChance
is actually lower than probability
:
select
*,
x.rolledChance < x.probability shouldPick
from
(select id, probability, rand() rolledChance from records) x
where rolledChance < probability;
那我想念什么? probability
和rolledChance
的用法是否与我在比较中所想的不同?每次在同一查询中使用别名时,是否会评估rand()
?
So what am I missing? Are the probability
and rolledChance
used differently than I thought in the comparison? Is the rand()
evaluated every time the alias is used in the same query?
版本输出:mysql Ver 15.1 Distrib 10.0.28-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
,在Debian Jessie上运行.
Version output: mysql Ver 15.1 Distrib 10.0.28-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
, running on Debian Jessie.
推荐答案
我认为问题是HAVING在GROUP BY之后但仍在SELECT阶段之前应用.我意识到这很令人困惑,因为HAVING子句引用了SELECT语句中的一列,但是我认为它基本上只执行SELECT语句中的内容两次-一次执行,然后再次执行SELECT.
I think the problem is that HAVING is applied after GROUP BY, but still before the SELECT phase. I realise it's confusing because the HAVING clause references a column from the SELECT statement, but I think it basically just executes whatever is in the SELECT statement twice - once for the having, and then again for the SELECT.
例如,请参见此答案.
请注意,这尤其令人困惑,因为如果在HAVING子句中引用未出现在SELECT语句中的列名,则会抛出错误.
Note, it's especially confusing because if you refer to a column name that doesn't appear in the SELECT statement in a HAVING clause it'll throw an error.
例如,这把小提琴
但是,按照上面的小提琴,它仍然可以让您实际上基于未出现在输出中的函数的结果进行过滤.长话短说,HAVING子句仍然可以满足您的要求,但是您不能同时过滤随机值并使用该方法同时显示它.如果需要这样做,则需要先使用子查询来修复该值,然后外部查询可以进行过滤并在其上显示.
But as per that fiddle above, it'll still let you actually filter based on the result of a function that doesn't appear in the output. Long story short, the HAVING clause is still doing what you want, but you can't both filter on a random value and display it at the same time using that approach. If you need to do that, you need to use a subquery to fix the value first, then the outer query can filter and display on it.
此外,为了清楚起见,仅在hading子句中使用RAND()可能值得,而不是SQL部分.尽管我知道这个问题是在问为什么,但这是在这样做,而不是试图专门解决问题.
Also, to make it clear, it's probably worth just using RAND() in the having clause, not the SQL part. Though I get that this question is asking why it's doing this rather than trying to solve the problem specifically.
这篇关于将`rand()`与`having`一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!