如何在Julia中识别列类型将混合类型的Matrix转换为DataFrame [英] How to convert a mixed-type Matrix to DataFrame in Julia recognising the column types

查看:173
本文介绍了如何在Julia中识别列类型将混合类型的Matrix转换为DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

DataFrames的一个不错的功能是它可以存储不同类型的列,并且可以自动识别"它们,例如:

One nice feature of DataFrames is that it can store columns with different types and it can "auto-recognise" them, e.g.:

using DataFrames, DataStructures

df1 = wsv"""
parName region  forType             value
vol     AL      broadL_highF        3.3055628012
vol     AL      con_highF           2.1360975151
vol     AQ      broadL_highF        5.81984502
vol     AQ      con_highF           8.1462998309
"""
typeof(df1[:parName])
DataArrays.DataArray{String,1}
typeof(df1[:value])
DataArrays.DataArray{Float64,1}

但是当我尝试从矩阵(从电子表格导入)开始获得相同的结果时,我松开"了自动转换:

When I do try however to reach the same result starting from a Matrix (imported from spreadsheet) I "loose" that auto-conversion:

dataMatrix = [
    "parName"   "region"    "forType"       "value";
    "vol"       "AL"        "broadL_highF"  3.3055628012;
    "vol"       "AL"        "con_highF"     2.1360975151;
    "vol"       "AQ"        "broadL_highF"  5.81984502;
    "vol"       "AQ"        "con_highF"     8.1462998309;
]
h    = [Symbol(c) for c in dataMatrix[1,:]]
vals = dataMatrix[2:end, :]
df2  = convert(DataFrame,OrderedDict(zip(h,[vals[:,i] for i in 1:size(vals,2)])))

typeof(df2[:parName])  
DataArrays.DataArray{Any,1}
typeof(df2[:value])  
DataArrays.DataArray{Any,1}

关于S.O.有几个问题有关如何将矩阵转换为数据框的信息(例如带有标头的数组中的数据框将Julia数组转换为数据框),但是那里没有答案能解决混合的转换问题类型矩阵.

There are several questions on S.O. on how to convert a Matrix to Dataframe (e.g. DataFrame from Array with Header, Convert Julia array to dataframe), but none of the answer there deal with the conversion of a mixed-type matrix.

如何从自动识别列类型的矩阵创建DataFrame?

How could I create a DataFrame from a matrix auto-recognising the type of the columns ?

对这三种解决方案进行了基准测试:(1)转换df(使用字典或矩阵)构造函数..第一个更快),然后将try-catch应用于类型转换(我的原始答案); (2)转换为字符串,然后使用df.inlinetable(Dan Getz回答); (3)检查每个元素的类型及其按列的一致性(Alexander Morley答案).

I did benchmark the three solutions: (1) convert the df (using the dictionary or matrix constructor.. first one is faster) and then apply try-catch for type conversion (my original answer); (2) convert to string and then use df.inlinetable (Dan Getz answer); (3) check the type of each element and their column-wise consistency (Alexander Morley answer).

这些是结果:

# second time for compilation.. further times ~ results
@time toDf1(m) # 0.000946 seconds (336 allocations: 19.811 KiB)
@time toDf2(m) # 0.000194 seconds (306 allocations: 17.406 KiB)
@time toDf3(m) # 0.001820 seconds (445 allocations: 35.297 KiB)

所以,很疯狂的是,最有效的解决方案似乎是倒水",并将问题减少到已经解决的问题上;-)

So, crazy it is, the most efficient solution seems to "pour out the water" and reduce the problem to an already solved one ;-)

谢谢所有答案.

推荐答案

另一种方法是重用有效的解决方案,即将矩阵转换为适合DataFrame使用的字符串.在代码中,这是:

Another method would be reuse the working solution i.e. convert the matrix into a string appropriate for DataFrames to consume. In code, this is:

using DataFrames

dataMatrix = [
    "parName"   "region"    "forType"       "value";
    "vol"       "AL"        "broadL_highF"  3.3055628012;
    "vol"       "AL"        "con_highF"     2.1360975151;
    "vol"       "AQ"        "broadL_highF"  5.81984502;
    "vol"       "AQ"        "con_highF"     8.1462998309;
]

s = join(
  [join([dataMatrix[i,j] for j in indices(dataMatrix, 2)]
  , '\t') for i in indices(dataMatrix, 1)], '\n')

df = DataFrames.inlinetable(s; separator='\t', header=true)

结果df具有DataFrame猜测的列类型.

The resulting df has its column types guessed by DataFrame.

不相关,但是这个答案让我想起了

Unrelated, but this answer reminds me of the how a mathematician boils water joke.

这篇关于如何在Julia中识别列类型将混合类型的Matrix转换为DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆