使用BigQuery查找带有标准偏差结果并结合WHERE子句的离群值 [英] Using BigQuery to find outliers with standard deviation results combined with WHERE clause
本文介绍了使用BigQuery查找带有标准偏差结果并结合WHERE子句的离群值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
标准偏差分析是找到异常值的有用方法.有没有一种方法可以合并该查询的结果(找到远离平均值的第四标准差的值)...
Standard deviation analysis can be a useful way to find outliers. Is there a way to incorporate the result of this query (finding the value of the fourth standard deviation away from the mean)...
SELECT (AVG(weight_pounds) + STDDEV(weight_pounds) * 4) as high FROM [publicdata:samples.natality];
结果= 12.721342001626912
result = 12.721342001626912
...进入另一个查询,该查询生成有关哪些州和日期出生的婴儿最多的信息比平均值高4个标准差?
...Into another query that produces information about which states and dates have the most babies born heavier that 4 standard deviations from average?
SELECT state, year, month ,COUNT(*) AS outlier_count
FROM [publicdata:samples.natality]
WHERE
(weight_pounds > 12.721342001626912)
AND
(state != '' AND state IS NOT NULL)
GROUP BY state, year, month
ORDER BY outlier_count DESC;
结果:
Row state year month outlier_count
1 MD 1990 12 22
2 NY 1989 10 17
3 CA 1991 9 14
从本质上讲,最好将其组合成一个查询.
Essentially it would be great to combine this into a single query.
推荐答案
您可以为此滥用JOIN(这样会降低性能):
You can abuse JOIN for this (and thus performance will suffer):
SELECT n.state, n.year, n.month ,COUNT(*) AS outlier_count
FROM (
SELECT state, year, month, weight_pounds, 1 as key
FROM [publicdata:samples.natality]) as n
JOIN (
SELECT (AVG(weight_pounds) + STDDEV(weight_pounds) * 4) as giant_baby,
1 as key
FROM [publicdata:samples.natality]) as o
ON n.key = o.key
WHERE
(n.weight_pounds > o.giant_baby)
AND
(n.state != '' AND n.state IS NOT NULL)
GROUP BY n.state, n.year, n.month
ORDER BY outlier_count DESC;
这篇关于使用BigQuery查找带有标准偏差结果并结合WHERE子句的离群值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文