我应该使用 data.frame 还是矩阵? [英] Should I use a data.frame or a matrix?
问题描述
什么时候应该使用data.frame
,什么时候使用matrix
更好?
When should one use a data.frame
, and when is it better to use a matrix
?
两者都以矩形格式保存数据,因此有时不清楚.
Both keep data in a rectangular format, so sometimes it's unclear.
是否有关于何时使用哪种数据类型的一般经验法则?
Are there any general rules of thumb for when to use which data type?
推荐答案
部分答案已包含在您的问题中:如果列(变量)可以预期为不同类型(数字/字符/逻辑等).矩阵用于相同类型的数据.
Part of the answer is contained already in your question: You use data frames if columns (variables) can be expected to be of different types (numeric/character/logical etc.). Matrices are for data of the same type.
因此,只有当您拥有相同类型的数据时,选择 matrix/data.frame 才会有问题.
Consequently, the choice matrix/data.frame is only problematic if you have data of the same type.
答案取决于您将如何处理 data.frame/matrix 中的数据.如果要将其传递给其他函数,则这些函数的参数的预期类型决定了选择.
The answer depends on what you are going to do with the data in data.frame/matrix. If it is going to be passed to other functions then the expected type of the arguments of these functions determine the choice.
还有:
矩阵的内存效率更高:
m = matrix(1:4, 2, 2)
d = as.data.frame(m)
object.size(m)
# 216 bytes
object.size(d)
# 792 bytes
如果您打算进行任何线性代数类型的运算,矩阵是必不可少的.
Matrices are a necessity if you plan to do any linear algebra-type of operations.
如果您经常按名称(通过紧凑的 $ 运算符)引用其列,则数据框会更方便.
Data frames are more convenient if you frequently refer to its columns by name (via the compact $ operator).
恕我直言,数据框也更适合报告(打印)表格信息,因为您可以分别对每一列应用格式.
Data frames are also IMHO better for reporting (printing) tabular information as you can apply formatting to each column separately.
这篇关于我应该使用 data.frame 还是矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!