通过read.big.matrix读取R中的大数据 [英] Reading big data in R by read.big.matrix

查看:465
本文介绍了通过read.big.matrix读取R中的大数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用read.big.matrix在r中读取尺寸为3131875 * 5的数据.我的数据同时包含字符和数字列,包括日期变量.我应该使用的命令是

I am reading a data of dimension 3131875*5 in r using read.big.matrix. My data has both character and numeric columns including date variable. The command which I should use is

as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
                       header=TRUE, 
                       backingfile="session.bin",
                       descriptorfile="session.desc",
                       type = NA)

但是在这种情况下,R中不接受type = NA,并且出现错误:

But type = NA is not accepted in R in this case and I am getting an error:

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type,  : 
  Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",  :
  Because type was not specified, we chose double based on the first line of data.

我需要知道这里的type应该是什么.我尝试使用double之类的选项,但这会抛出相同的错误.

I need to know what should be the type here. I tried with options like double but that is throwing me same error.

请帮助我.

推荐答案

来自?read.big.matrix:

文件必须仅包含一种原子类型(例如,所有整数).

Files must contain only one atomic type (all integer, for example).

因此,您将无法读取包含字符,数字,整数,日期等组合的数据.您可以对文件进行一些工作,例如使用其他程序将字符变量转换为整数表示形式(例如转换为R中的因子).

Therefore, you won't be able to read in data with combinations of character, numeric, integer, date, etc. You could do some work on the file, for instance using a different program to convert the character variables to integer representations (like converting to a factor in R).

bigmemory网站上,有一个使用python脚本将字符信息更改为整数.该脚本是为特定数据集编写的,但也许您可以将其用作数据指南.

On the bigmemory website there's an example of preprocessing data using a python script to change character information to integer. The script is written for a specific dataset, but perhaps you could use it as a guideline for your data.

这篇关于通过read.big.matrix读取R中的大数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆