在BigQuery中对记录进行分组并获取分组记录的标准偏差间隔,从而得到错误的值 [英] Grouping records and getting standard deviation intervals for grouped records in BigQuery, getting wrong value
问题描述
我下面有一个SQL,它能够获取按icao_address,flight_number,flight_date分组的timestamp列的间隔平均值.我正在尝试对标准偏差进行同样的操作,尽管我得到了一个数字,但这是错误的.我得到的标准偏差是14.06(请看下面的图片看),而它应该在1.8左右.
I have a SQL below which is able to get the interval average of timestamp column grouped by icao_address, flight_number, flight_date. I'm trying to do the same for standard deviation and although I get a figure, it is wrong. The standard deviation that I get back is 14.06 (look at image below to see) while it should be around 1.8.
以下是我用于stddev计算的内容.
Below is what I'm using for stddev calculation.
STDDEV_POP(UNIX_SECONDS(timestamp))as standard_deviation
下面是我的SQL
#standardSQL
select DATE(timestamp) as flight_date, safe_divide(timestamp_diff(max(timestamp), min(timestamp),SECOND), (COUNT(DISTINCT(timestamp)) - 1))as avg_interval_message, STDDEV_POP(UNIX_SECONDS(timestamp))as standard_deviation,
icao_address, flight_number, min(timestamp) as firstrecord, max(timestamp) as lastrecord, count(timestamp) as target_updates
from `ais-data-analysis._analytics._aoi_table`
group by icao_address, flight_number, flight_date
having avg_interval_message is not null and flight_number is not null and icao_address = '4B8E41'
order by flight_date, avg_interval_message ASC
时间戳列是我要获取的标准偏差,它们之间的间隔是10条记录
The timestamp column is what I'm trying to get the standard deviation of, of the intervals between them, it's 10 records
推荐答案
您可以使用STDDEV_POP(<FLOAT>)
计算标准差,如您所见
You can use STDDEV_POP(<FLOAT>)
to calculate the standard deviation as you can see here
说明
返回值的总体(偏差)标准偏差.这 返回结果在0到+ Inf之间.
Returns the population (biased) standard deviation of the values. The return result is between 0 and +Inf.
此函数将忽略所有NULL输入.如果忽略所有输入,则此 函数返回NULL.
This function ignores any NULL inputs. If all inputs are ignored, this function returns NULL.
如果此函数收到单个非NULL输入,则返回0.
If this function receives a single non-NULL input, it returns 0.
支持的输入类型
FLOAT64
可选条款
这些子句按以下顺序应用:
The clauses are applied in the following order:
OVER:指定一个窗口.请参阅分析函数.本条是 当前与STDDEV_POP()中的所有其他子句不兼容. DISTINCT:表达式的每个不同值仅汇总一次 进入结果.
OVER: Specifies a window. See Analytic Functions. This clause is currently incompatible with all other clauses within STDDEV_POP(). DISTINCT: Each distinct value of expression is aggregated only once into the result.
返回数据类型
FLOAT64
我希望对您有帮助
这篇关于在BigQuery中对记录进行分组并获取分组记录的标准偏差间隔,从而得到错误的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!