Hive SQL中的嵌套查询 [英] Nested queries in Hive SQL
问题描述
我有一个数据库,并且使用一个查询来生成这样的中间表:
id ab
xx 1 2
yy 7 11
我想计算标准偏差b对于具有<平均(a)
我这样计算avg(a),它工作正常:
select avg(从表中选择一个(查询来产生中间表)));
但是查询:
选择stddev_pop(b)
from(query to produce intermediate table)
其中a< (选择avg(从表中选择一个
(查询产生中间表))
);
返回我的错误,更确切地说,我被告知a来自avg(select a ...)不被识别。这让我非常困惑,因为它在前面的查询中起作用。
如果有人可以提供帮助,我将不胜感激。
编辑:
我将查询的结果存储到临时表中,但仍然遇到同样的问题。
非工作查询变为:
select stddev_pop(b)from temp where a< (从temp中选择avg(a));
同时工作:
从temp中选择avg(a);
好的,一位同事帮助我做到了。如果有人遇到同样的问题,我会发布答案:
select stddev_pop(b)
from temp x
加入(从温度选择avg(a)作为平均值)y
其中xa < y.average;
基本上,配置单元不会将表缓存为变量。
I have a database, and I use a query to produce an intermediate table like this:
id a b
xx 1 2
yy 7 11
and I would like to calculate the standard deviations of b for the users who have a < avg(a)
I calculate avg(a) that way and it works fine:
select avg(select a from (query to produce intermediate table)) from table;
But the query:
select stddev_pop(b)
from (query to produce intermediate table)
where a < (select avg(select a
from (query to produce intermediate table))
from table);
Returns me an error, and more precisely, I am told that the "a" from avg(select a from...) is not recognised. This makes me really confused, as it works in the previous query.
I would be grateful if somebody could help.
EDIT:
I stored the result of my query to generate the intermediary table into a temporary table, but still run into the same problem. The non working query becomes:
select stddev_pop(b) from temp where a < (select avg(a) from temp);
while this works:
select avg(a) from temp;
OK, a colleague helped me to do it. I'll post the answer in case someone runs into the same problem:
select stddev_pop(b)
from temp x
join (select avg(a) as average from temp) y
where x.a < y.average;
Basically hive doesn't do caching of a table as a variable.
这篇关于Hive SQL中的嵌套查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!