我可以“分组"吗?朱莉娅的一系列字典? [英] Can I "group by" an array of dictionaries in Julia?

查看:76
本文介绍了我可以“分组"吗?朱莉娅的一系列字典?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从JSON文件读取数组,因为在将其转换为DataFrame进行进一步操作之前,我需要对其进行精简.为了争辩,我们说就是这样

I'm reading in an array from a JSON file because I need to perform a reduce on it before turning it into a DataFrame for further manipulation. For the sake of argument, let's say this is it

a = [Dict("A" => 1, "B" => 1, "C" => "a")
     Dict("A" => 1, "B" => 2, "C" => "b")
     Dict("A" => 2, "B" => 1, "C" => "b")
     Dict("A" => 2, "B" => 2, "C" => "a")]

现在,如果我可以按一个或多个键(例如,A和C)对数组进行分组,对每个组执行更简单的归约,然后将行重新组合为更大的列,则将大大简化我正在执行的归约Dict的数组,然后我可以轻松地将其转换为DataFrame.

Now, the reduce I'm performing would be greatly simplified if I could group the array by one or more keys (say, A and C), perform a simpler reduce on each group, and recombine the rows later into a larger array of Dicts that I can then easily turn into a DataFrame.

一种解决方案是将其转换为DataFrame,将其拆分为组,将单个组转换为矩阵,进行归约(有些困难,因为现在我已经失去了按其名称引用元素的能力),将简化后的矩阵转回(Sub?)DataFrame(由于名称而有些困难),并希望将它们很好地组合成一个大的DataFrame.

One solution would be to turn this into a DataFrame, split it into groups, turn individual groups into matrices, do the reduce (with some difficulty, because now I've lost the ability to refer to elements by their name), turn the reduced matrices back into (Sub?)DataFrames (with some more difficulty because names), and hope it all comes together nicely into one giant DataFrame.

有没有更简单和/或更实用的方法?

Any easier and/or more practical way of doing this?

编辑在有人建议我看一下Query.jl之前,我正在运行的reduce返回一个数组,其中包含较少的行,因为我要压缩某些成对的后续行.如果我可以用Query.jl做这样的事情,那么有人可以暗示怎么做,因为文档尚不清楚如何聚合"任何不返回单个值的内容.示例:

EDIT Before somebody suggests I look at Query.jl, the reduce I'm running returns an array, with fewer rows because I'm squashing certain pairs of subsequent rows. If I can do such a thing with Query.jl, could somebody hint at how, because the documentation isn't exactly clear on how to "aggregate" with anything that doesn't return a single value. Example:

 A   B   C
-----------
 1       a
 2   1   a
 3       b
 4   2   b

应按"C"分组,并将该表变成类似

should group by "C" and turn that table into something like

 A   B   C
-----------
 1   1   a
 3   2   b

为澄清起见,reduce正在起作用,我只想简化一下,不必在挤压之前检查一行是否属于上一行的同一组.

To clarify, the reduce is working, I only want to simplify it by not having to check if a row belongs to the same group of the previous row before doing the squashing.

推荐答案

它仍处于实验阶段,但 可能会成功.您可以使用所需的任何键函数对任意可迭代对象进行分组,并在最后获得一个密钥-> group dict.

It's still experimental, but SplitApplyCombine.jl might do the trick. You can group arbitrary iterables using any key function you want, and get a key -> group dict out at the end.

julia> ## Pkg.clone("https://github.com/JuliaData/SplitApplyCombine.jl.git")

julia> using SplitApplyCombine

julia> group(x->x["C"], a)
Dict{Any,Array{Dict{String,Any},1}} with 2 entries:
  "b" => Dict{String,Any}[Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 1),Pair{String,Any}("C", "b")), Dict{String,Any}(Pair{String,Any}("…
  "a" => Dict{String,Any}[Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 1),Pair{String,Any}("C", "a")), Dict{String,Any}(Pair{String,Any}("…

然后,您可以使用标准的[map]reduce操作(此处使用SAC @_宏进行配管):

Then you can use standard [map]reduce operations (here using the SAC @_ macro for piping):

julia> @_ a |> group(x->x["C"], _) |> values(_) |> reduce(vcat, _)
4-element Array{Dict{String,Any},1}:
 Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 1),Pair{String,Any}("C", "b"))
 Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 2),Pair{String,Any}("C", "b"))
 Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 1),Pair{String,Any}("C", "a"))
 Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 2),Pair{String,Any}("C", "a"))

这篇关于我可以“分组"吗?朱莉娅的一系列字典?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆