如何在 Julia 中将混合类型的矩阵转换为 DataFrame,识别列类型 [英] How to convert a mixed-type Matrix to DataFrame in Julia recognising the column types

查看:36
本文介绍了如何在 Julia 中将混合类型的矩阵转换为 DataFrame,识别列类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

DataFrames 的一个不错的特性是它可以存储不同类型的列,并且可以自动识别"它们,例如:

One nice feature of DataFrames is that it can store columns with different types and it can "auto-recognise" them, e.g.:

using DataFrames, DataStructures

df1 = wsv"""
parName region  forType             value
vol     AL      broadL_highF        3.3055628012
vol     AL      con_highF           2.1360975151
vol     AQ      broadL_highF        5.81984502
vol     AQ      con_highF           8.1462998309
"""
typeof(df1[:parName])
DataArrays.DataArray{String,1}
typeof(df1[:value])
DataArrays.DataArray{Float64,1}

当我尝试从矩阵(从电子表格导入)开始达到相同的结果时,我松散"了自动转换:

When I do try however to reach the same result starting from a Matrix (imported from spreadsheet) I "loose" that auto-conversion:

dataMatrix = [
    "parName"   "region"    "forType"       "value";
    "vol"       "AL"        "broadL_highF"  3.3055628012;
    "vol"       "AL"        "con_highF"     2.1360975151;
    "vol"       "AQ"        "broadL_highF"  5.81984502;
    "vol"       "AQ"        "con_highF"     8.1462998309;
]
h    = [Symbol(c) for c in dataMatrix[1,:]]
vals = dataMatrix[2:end, :]
df2  = convert(DataFrame,OrderedDict(zip(h,[vals[:,i] for i in 1:size(vals,2)])))

typeof(df2[:parName])  
DataArrays.DataArray{Any,1}
typeof(df2[:value])  
DataArrays.DataArray{Any,1}

关于 S.O. 有几个问题.关于如何将 Matrix 转换为 Dataframe(例如 DataFrame from Array with Header, 将 Julia 数组转换为数据帧),但没有一个答案涉及混合的转换-类型矩阵.

There are several questions on S.O. on how to convert a Matrix to Dataframe (e.g. DataFrame from Array with Header, Convert Julia array to dataframe), but none of the answer there deal with the conversion of a mixed-type matrix.

如何从自动识别列类型的矩阵创建 DataFrame?

How could I create a DataFrame from a matrix auto-recognising the type of the columns ?

对三个解决方案进行了基准测试:(1)转换 df(使用字典或矩阵构造函数..第一个更快)然后应用 try-catch 进行类型转换(我的原始答案);(2)转换为字符串,然后使用 df.inlinetable (Dan Getz 回答);(3) 检查每个元素的类型及其列一致性(Alexander Morley 回答).

I did benchmark the three solutions: (1) convert the df (using the dictionary or matrix constructor.. first one is faster) and then apply try-catch for type conversion (my original answer); (2) convert to string and then use df.inlinetable (Dan Getz answer); (3) check the type of each element and their column-wise consistency (Alexander Morley answer).

这些是结果:

# second time for compilation.. further times ~ results
@time toDf1(m) # 0.000946 seconds (336 allocations: 19.811 KiB)
@time toDf2(m) # 0.000194 seconds (306 allocations: 17.406 KiB)
@time toDf3(m) # 0.001820 seconds (445 allocations: 35.297 KiB)

所以,太疯狂了,最有效的解决方案似乎是倒水"并将问题减少到已经解决的问题;-)

So, crazy it is, the most efficient solution seems to "pour out the water" and reduce the problem to an already solved one ;-)

谢谢大家的回答.

推荐答案

另一种方法是重用工作解决方案,即将矩阵转换为适合 DataFrame 使用的字符串.在代码中,这是:

Another method would be reuse the working solution i.e. convert the matrix into a string appropriate for DataFrames to consume. In code, this is:

using DataFrames

dataMatrix = [
    "parName"   "region"    "forType"       "value";
    "vol"       "AL"        "broadL_highF"  3.3055628012;
    "vol"       "AL"        "con_highF"     2.1360975151;
    "vol"       "AQ"        "broadL_highF"  5.81984502;
    "vol"       "AQ"        "con_highF"     8.1462998309;
]

s = join(
  [join([dataMatrix[i,j] for j in indices(dataMatrix, 2)]
  , '	') for i in indices(dataMatrix, 1)], '
')

df = DataFrames.inlinetable(s; separator='	', header=true)

生成的 df 的列类型由 DataFrame 猜测.

The resulting df has its column types guessed by DataFrame.

无关,但这个答案让我想起了 数学家如何煮水的笑话.

Unrelated, but this answer reminds me of the how a mathematician boils water joke.

这篇关于如何在 Julia 中将混合类型的矩阵转换为 DataFrame,识别列类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆