什么决定了 R 中保存对象的大小? [英] What determines the size of a saved object in R?

查看:59
本文介绍了什么决定了 R 中保存对象的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用 save() 从 R 保存一个对象时,什么决定了保存文件的大小?显然它与 object.size() 确定的对象大小不同(或接近).

When I save an object from R using save(), what determines the size of the saved file? Clearly it is not the same (or close to) the size of the object determined by object.size().

示例:我读取了一个数据框并使用

Example: I read a data frame and saved it using

snpmat=read.table("Heart.txt.gz",header=T)
save(snpmat,file="datamat.RData")

文件 datamat.RData 的大小为 360MB.

The size of the file datamat.RData is 360MB.

> object.size(snpmat)
4998850664 bytes        #Much larger

然后我进行了一些回归分析并获得了另一个相同维度(6820000 行和 80 列)的数据框 adj.snpmat.

Then I performed some regression analysis and obtained another data frame adj.snpmat of same dimensions (6820000 rows and 80 columns).

> object.size(adj.snpmat)
4971567760 bytes       

我使用

> save(adj.snpmat,file="adj.datamat.RData")

现在文件 adj.datamat.RData 的大小是 3.3GB.我很困惑为什么这两个文件的大小如此不同,而 object.size() 给出了相似的大小.欢迎任何有关决定保存对象大小的想法的想法.

Now the size of the file adj.datamat.RData is 3.3GB. I'm confused why the two files are so different in size while the object.size() gives similar sizes. Any idea about what determines the size of the saved object is welcome.

更多信息:

> typeof(snpmat)
[1] "list"

> class(snpmat)
[1] "data.frame"

> typeof(snpmat[,1])
[1] "integer"

> typeof(snpmat[,2])
[1] "double"         #This is true for all columns except column 1

> typeof(adj.snpmat)
[1] "list"

> class(adj.snpmat)
[1] "data.frame"

> typeof(adj.snpmat[,1])
[1] "character"

> typeof(adj.snpmat[,2])
[1] "double"         #This is true for all columns except column 1

推荐答案

您的矩阵非常不同,因此压缩方式也非常不同.

Your matrices are very different and therefore compress very differently.

SNP 数据仅包含几个值(例如 1 或 0),而且非常稀疏.这意味着它很容易压缩.例如,如果您有一个全为零的矩阵,您可以考虑通过指定单个值 (0) 以及维度来压缩数据.

SNP data contains only a few values (e.g., 1 or 0) and is also very sparse. This means that is very easy to compress. For example, if you had a matrix of all zeros, you could think of compressing the data by specifying a single value (0) as well as the dimensions.

您的回归矩阵包含许多不同类型的值,并且也是实数(我假设是 p 值、系数等).这使得它的可压缩性大大降低.

Your regression matrix contains many different types of values, and are also real numbers (I'm assuming p-values, coefficients, etc.) This makes it much less compressible.

这篇关于什么决定了 R 中保存对象的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆