将JSON数组读取为类似Julia DataFrame的类型 [英] Reading JSON array into Julia DataFrame-like type
问题描述
给出一个JSON文件,JSON包愉快地对其进行了解析.但是,如果我希望将它作为DataFrame
(或任何其他列式数据结构),那么获取它的好方法是什么?
Given a JSON file, the JSON package happily parses it. But if I would like it as a DataFrame
(or any other columnar data structure), what would be a good way to get it?
例如,目前,我有:
using JSON
using DataFrames
json_str = """
[{ "color": "red", "value": "#f00" }, { "color": "green", "value": "#0f0" },
{ "color": "blue", "value": "#00f" }, { "color": "cyan", "value": "#0ff" },
{ "color": "magenta", "value": "#f0f" }, { "color": "yellow", "value": "#ff0" },
{ "color": "black", "value": "#000" } ]
"""
function jsontodf(a)
ka = union([keys(r) for r in a]...)
df = DataFrame(;Dict(Symbol(k)=>get.(a,k,NA) for k in ka)...)
return df
end
a = JSON.Parser.parse(json_str)
jsontodf(a)
结果为:
7×2 DataFrames.DataFrame
│ Row │ color │ value │
├─────┼───────────┼────────┤
│ 1 │ "red" │ "#f00" │
│ 2 │ "green" │ "#0f0" │
│ 3 │ "blue" │ "#00f" │
│ 4 │ "cyan" │ "#0ff" │
│ 5 │ "magenta" │ "#f0f" │
│ 6 │ "yellow" │ "#ff0" │
│ 7 │ "black" │ "#000" │
,并使用NA来处理某些缺少的字段.有没有更清洁/更快的版本(Julia v0.6 +)?
and also handles some missing fields with NAs. Anything cleaner / faster (Julia v0.6+) ?
推荐答案
我已经解决了这个老问题,现在从DataFrames.jl 0.18.0开始,我们有了一个更好的解决方案.
I have dug out this old question, and now we have a better solution for it as of DataFrames.jl 0.18.0.
如果JSON中的所有条目都具有相同的字段,则可以编写:
If all entries in JSON have the same fields you can write:
reduce(vcat, DataFrame.(a))
如果您必须处理每个字典中不同字段的可能性,请输入:
If you have to handle the possibility of different fields in each dict then write:
vcat(DataFrame.(a)..., cols=:union)
如果a
有很多条目(因为它会飞溅),则可能会出现一些问题.我刚刚提交了一份PR,以便您也可以写:
This can be slightly problematic if a
has a lot of entries as it does splatting. I have just submitted a PR so that you will be also able to write:
reduce(vcat, DataFrame.(a), cols=:union)
在不久的将来.
这篇关于将JSON数组读取为类似Julia DataFrame的类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!