fread in R导入一个大的.csv文件作为一行的数据框 [英] fread in R imports a large .csv file as a data frame with one row
问题描述
我要导入一个大的.csv文件到R(约50万行),所以我一直在尝试使用fread()从data.table包作为一个更快的替代read.table()和read .csv()。但是,fread()返回一个数据帧,其中包含一行内的所有数据,即使它具有正确的列数。我发现了一个2013年的错误报告,显示这与整数64数据类相关:
I'm importing a large .csv file into R (about 0.5 million rows), so I've been trying to use fread() from the data.table package as a faster alternative to read.table() and read.csv(). However, fread() returns a data frame with all of the data from the rows inside one row, even though it has the correct number of columns. I found a bug report from 2013 showing this is related to the integer64 data class:
http://r-forge.r-project.org/tracker/index.php?func = detail& aid = 2786& group_id = 240& atid = 975
有任何修复或解决方法吗?
Are there any fixes or ways to get around this?
我想读取的.csv文件是0到10000之间的整数,没有丢失的数据。我在Windows 7计算机上使用R版本2.15.2,版本1.8.8的data.table包。
The .csv file I'm trying to read is entirely integers ranging from 0 - 10000, with no missing data. I'm using R version 2.15.2 on a Windows 7 computer, with version 1.8.8 of the data.table package.
我运行的代码是:
require(data.table)
fread("pre2012_alldatapoints.csv", sep = ",", header= TRUE)-> pre
head(pre)
1: 1 22 -105 22 -105
2: 2 22 -105 22 -105
3: 3 20 -105 20 -105
4: 4 21 -105 21 -105
5: 5 21 -105 21 -105
6: 6 21 -105 21 -105
dim(pre)
[1] 12299 5 #dim returns the correct number of dimensions
#this is a subset of the file I want to import that I've confirmed imports correctly with read.csv
pre[,1]
[1] 1 #but trying to print a column returns this
length(pre[,1])
[1] 1 #and length for any column returns a row length of 1
非常感谢您的帮助!
推荐答案
fread
创建一个 data.table
。 data.table
包附带了一些小插曲。
fread
creates a data.table
. The data.table
package comes with a number of vignettes.
您的精确问题在FAQ 1.1中从 data.table常见问题 - 第一个常见问题解答
Your precise issues is addressed in FAQ 1.1 from the data.table FAQ - the very first FAQ!
默认情况下,第二个参数为 .data.table
是在data.table范围内求值的表达式
By default the second argurment to [.data.table
is an expression evaluated within the scope of the data.table
因此 pre [,1]
在 pre
范围内计算 1
。 1
仍然是 1
。如果你想通过列号引用,请使用 with = FALSE
pre [,1,with = FALSE]
therefore pre[,1]
evaluates 1
within the scope of pre
. 1
is still 1
. If you want' to reference by column number, use with=FALSE
pre[,1,with=FALSE]
这篇关于fread in R导入一个大的.csv文件作为一行的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!