Julia中分组列的多个摘要统计信息 [英] Multiple summary statistics on grouped column in Julia

查看：41 发布时间：2021/5/28 18:45:44 julia

本文介绍了Julia中分组列的多个摘要统计信息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试下面的代码来与Julia(1.5.3)一起使用，它只是我正在尝试做的事情的代表.

I am trying below code to work with Julia(1.5.3), Its just a representation of what I am trying to do.

using DataFrames
using DataFramesMeta
using RDatasets

## setup
iris = dataset("datasets", "iris")
gdf = groupby(iris, :Species)

## Applying the split combine
## This code works fine
combine(gdf, nrow, (valuecols(gdf) .=> mean))

但是，当我尝试进行多次汇总操作时，它会失败

But, when I try to do it for multiple summary it fails

 combine(gdf, nrow, (valuecols(gdf) .=> [mean, sum]))

错误:

错误:DimensionMismatch("数组无法广播到公共尺寸;尺寸为长度4和2"))

ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 4 and 2")

对错误进行少量调试表明，如果我将代码更改为此:

Little debug on error suggests that If I change my code to this:

combine(gdf, nrow, ([:SepalLength, :PetalLength] .=> [mean,sum]))
## This code works but its still not correct as it doesn't tell me the mean and sum of both the columns , rather mean for SepalLength and sum for PetalLength, which was expected as per previous error

对此进行了更多研究，我意识到，我们可以做类似的事情，这个结果是正确的，但是结果是长表格而不是宽表格.我原以为这会给我答案，但是似乎无法按预期进行.

A little more research into it and I realized that, we can do something like this, this result is correct but the outcome is in long form of table not the wide form. I was expecting this would have given me the answer to my question, but it seems it doesn't work as expected.

 combine(gdf, ([:SepalWidth, :PetalWidth] .=>  x -> ([sum(x), mean(x)])))

 ## The code above works but output is 6x3 DataFrame, I was expecting 3x6 DataFrame

我的问题是:

有没有办法以这样的方式使用拆分组合，即获得如下所示的宽表(我已经将"do end"和"combine"一起使用来生成拆分表).我对这个解决方案还可以，但是我需要在这里输入所有列，是否有任何办法可以将所有汇总统计信息(总和，中位数，均值等)作为合并中提供的所有列的列.我希望我的问题很清楚，如果有重复或沟通不佳，请指出.谢谢

Is there any way to use split combine in such a way that I get a wide table like below (I have used "do end" with "combine" to generate it). I am okay with this solution, but I need to type out all the column here, Is there any way such that I can get all the summary stats(sum, median, mean etc) as columns for all the column provided in combine. I hope my question is clear, Please point out in case its a duplicate or its not well communicated. Thanks

combine(gdf) do x
    return(sw_sum = sum(x.SepalWidth), 
           sw_mean = mean(x.SepalWidth), 
           sp_mean = mean(x.PetalWidth), 
           sp_sum = sum(x.PetalWidth)
          )
end



## My expected answer should be similar to this
#3×5 DataFrame
# Row │ Species     sw_sum   sw_mean  sp_mean  sp_sum
#     │ Cat…        Float64  Float64  Float64  Float64
#─────┼────────────────────────────────────────────────
#   1 │ setosa        171.4    3.428    0.246     12.3
#   2 │ versicolor    138.5    2.77     1.326     66.3
#   3 │ virginica     148.7    2.974    2.026    101.3

而且，这可行:

 combine(gdf, [:1] .=> [mean, sum, minimum, maximum,median])

但这并不会，并且会引发如上所述的尺寸错误，仍然让我为之困惑:

But this doesn't and throws the dimension error like above, still scratching my head over this:

combine(gdf, [:1, :2] .=> [mean, sum, minimum, maximum,median])

Julia中分组列的多个摘要统计信息 [英] Multiple summary statistics on grouped column in Julia

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Julia中分组列的多个摘要统计信息 [英] Multiple summary statistics on grouped column in Julia

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭