通过read.big.matrix读取R中的大数据 [英] Reading big data in R by read.big.matrix
问题描述
我正在使用read.big.matrix
在r中读取尺寸为3131875 * 5的数据.我的数据同时包含字符和数字列,包括日期变量.我应该使用的命令是
I am reading a data of dimension 3131875*5 in r using read.big.matrix
. My data has both character and numeric columns including date variable. The command which I should use is
as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
header=TRUE,
backingfile="session.bin",
descriptorfile="session.desc",
type = NA)
但是在这种情况下,R中不接受type = NA
,并且出现错误:
But type = NA
is not accepted in R in this case and I am getting an error:
Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type, :
Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt", :
Because type was not specified, we chose double based on the first line of data.
我需要知道这里的type
应该是什么.我尝试使用double
之类的选项,但这会抛出相同的错误.
I need to know what should be the type
here. I tried with options like double
but that is throwing me same error.
请帮助我.
推荐答案
来自?read.big.matrix
:
文件必须仅包含一种原子类型(例如,所有整数).
Files must contain only one atomic type (all integer, for example).
因此,您将无法读取包含字符,数字,整数,日期等组合的数据.您可以对文件进行一些工作,例如使用其他程序将字符变量转换为整数表示形式(例如转换为R中的因子).
Therefore, you won't be able to read in data with combinations of character, numeric, integer, date, etc. You could do some work on the file, for instance using a different program to convert the character variables to integer representations (like converting to a factor in R).
在 bigmemory网站上,有一个使用python脚本将字符信息更改为整数.该脚本是为特定数据集编写的,但也许您可以将其用作数据指南.
On the bigmemory website there's an example of preprocessing data using a python script to change character information to integer. The script is written for a specific dataset, but perhaps you could use it as a guideline for your data.
这篇关于通过read.big.matrix读取R中的大数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!