如何在 Julia 中将混合类型的矩阵转换为 DataFrame,识别列类型 [英] How to convert a mixed-type Matrix to DataFrame in Julia recognising the column types
问题描述
DataFrames 的一个不错的特性是它可以存储不同类型的列,并且可以自动识别"它们,例如:
One nice feature of DataFrames is that it can store columns with different types and it can "auto-recognise" them, e.g.:
using DataFrames, DataStructures
df1 = wsv"""
parName region forType value
vol AL broadL_highF 3.3055628012
vol AL con_highF 2.1360975151
vol AQ broadL_highF 5.81984502
vol AQ con_highF 8.1462998309
"""
typeof(df1[:parName])
DataArrays.DataArray{String,1}
typeof(df1[:value])
DataArrays.DataArray{Float64,1}
当我尝试从矩阵(从电子表格导入)开始达到相同的结果时,我松散"了自动转换:
When I do try however to reach the same result starting from a Matrix (imported from spreadsheet) I "loose" that auto-conversion:
dataMatrix = [
"parName" "region" "forType" "value";
"vol" "AL" "broadL_highF" 3.3055628012;
"vol" "AL" "con_highF" 2.1360975151;
"vol" "AQ" "broadL_highF" 5.81984502;
"vol" "AQ" "con_highF" 8.1462998309;
]
h = [Symbol(c) for c in dataMatrix[1,:]]
vals = dataMatrix[2:end, :]
df2 = convert(DataFrame,OrderedDict(zip(h,[vals[:,i] for i in 1:size(vals,2)])))
typeof(df2[:parName])
DataArrays.DataArray{Any,1}
typeof(df2[:value])
DataArrays.DataArray{Any,1}
关于 S.O. 有几个问题.关于如何将 Matrix 转换为 Dataframe(例如 DataFrame from Array with Header,
There are several questions on S.O. on how to convert a Matrix to Dataframe (e.g. DataFrame from Array with Header, Convert Julia array to dataframe), but none of the answer there deal with the conversion of a mixed-type matrix.
如何从自动识别列类型的矩阵创建 DataFrame?
How could I create a DataFrame from a matrix auto-recognising the type of the columns ?
我对三个解决方案进行了基准测试:(1)转换 df(使用字典或矩阵构造函数..第一个更快)然后应用 try-catch 进行类型转换(我的原始答案);(2)转换为字符串,然后使用 df.inlinetable (Dan Getz 回答);(3) 检查每个元素的类型及其列一致性(Alexander Morley 回答).
I did benchmark the three solutions: (1) convert the df (using the dictionary or matrix constructor.. first one is faster) and then apply try-catch for type conversion (my original answer); (2) convert to string and then use df.inlinetable (Dan Getz answer); (3) check the type of each element and their column-wise consistency (Alexander Morley answer).
这些是结果:
# second time for compilation.. further times ~ results
@time toDf1(m) # 0.000946 seconds (336 allocations: 19.811 KiB)
@time toDf2(m) # 0.000194 seconds (306 allocations: 17.406 KiB)
@time toDf3(m) # 0.001820 seconds (445 allocations: 35.297 KiB)
所以,太疯狂了,最有效的解决方案似乎是倒水"并将问题减少到已经解决的问题;-)
So, crazy it is, the most efficient solution seems to "pour out the water" and reduce the problem to an already solved one ;-)
谢谢大家的回答.
推荐答案
另一种方法是重用工作解决方案,即将矩阵转换为适合 DataFrame 使用的字符串.在代码中,这是:
Another method would be reuse the working solution i.e. convert the matrix into a string appropriate for DataFrames to consume. In code, this is:
using DataFrames
dataMatrix = [
"parName" "region" "forType" "value";
"vol" "AL" "broadL_highF" 3.3055628012;
"vol" "AL" "con_highF" 2.1360975151;
"vol" "AQ" "broadL_highF" 5.81984502;
"vol" "AQ" "con_highF" 8.1462998309;
]
s = join(
[join([dataMatrix[i,j] for j in indices(dataMatrix, 2)]
, ' ') for i in indices(dataMatrix, 1)], '
')
df = DataFrames.inlinetable(s; separator=' ', header=true)
生成的 df
的列类型由 DataFrame 猜测.
The resulting df
has its column types guessed by DataFrame.
无关,但这个答案让我想起了 数学家如何煮水的笑话.
Unrelated, but this answer reminds me of the how a mathematician boils water joke.
这篇关于如何在 Julia 中将混合类型的矩阵转换为 DataFrame,识别列类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!