防止fread（）中的列类推断 [英] Preventing column-class inference in fread()

查看：100 发布时间：2017/3/12 10:22:25 r data.table read.table

本文介绍了防止fread（）中的列类推断的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有一种方法 fread 模仿 read.table 的行为，其中

我有一个数字数据，在主数据下面有一些注释，这个变量的类是由读入的数据设置的。当我使用 fread 读取数据时，列被转换为字符。但是，通过在read.table中设置 nrow 我可以停止这种行为。这是可能在恐惧。（我不想改变原始数据或修改副本）。感谢

示例

  d<  -  data.frame x = c（1：100，NA，NA，fff），y = c（1：100，NA，NA，NA））
 write.csv（d，test.csv，row。 name = F）
 
 in_d<  -  read.csv（test.csv，nrow = 100，header = T）
 in_dt

这会产生

 > str（in_d）
'data.frame'：100 obs。的2个变量：
 $ x：int 1 2 3 4 5 6 7 8 9 10 ... 
 $ y：int 1 2 3 4 5 6 7 8 9 10 ... 
 > str（in_dt）
 Classes'data.table'和'data.frame'：100 obs。的2个变量：
 $ x：chr1234... 
 $ y：int 1 2 3 4 5 6 7 8 9 10 ... 
  -  attr（*，.internal.selfref）=< externalptr>作为解决方法我认为我将能够使用 read.table  
 / code>读取一行，获取类并设置 colClasses ，但我是误解。
  cl<  -  read.csv（test.csv，nrow = 1，header = T）
 cols<  -  unname（sapply（cl，class） ）
 in_dt<  -  data.table :: fread（test.csv，nrow = 100，colClasses = cols）
 str（in_dt）
  / pre> 
 
 使用Windows8.1 
 R版本3.1.2（2014-10-31）
平台：x86_64-w64-mingw32 / x64 （64位）
解决方案
 选项1：使用系统命令
 
 
   fread（）允许在其第一个参数中使用系统命令。我们可以使用它来删除文件第一列中的引号。
  indt<  -  data.table :: fread （cat test.csv | tr -d'\'，nrows = 100）
 str（indt）
＃Classes'data.table'和'data.frame'：100 obs。的2个变量：
＃$ x：int 1 2 3 4 5 6 7 8 9 10 ... 
＃$ y：int 1 2 3 4 5 6 7 8 9 10 ... 
＃ -  attr（*，.internal.selfref）=< externalptr> 
  
 strong>系统命令  cat test.csv | tr -d'\' 解释： 
 
 
  
   cat test.csv 将文件读入标准输出
 
   | 是一个管道，使用上一个命令的输出作为下一个命令的输入
 
   d'\'删除所有出现的双引号（'\' code>）
 
 
 
 
 
 
 
  选项二： 阅读后强制执行
 
 
 由于选项1似乎并不适用于您的系统，因此另一种可能是读取该文件，  x 列 type.convert（）。
  library（data.table）
 indt2 < -  fread（test.csv，nrows = 100）[，x：= type.convert（x）] 
 str（indt2）
＃Classes'data.table'和'data.frame'：100 obs。的2个变量：
＃$ x：int 1 2 3 4 5 6 7 8 9 10 ... 
＃$ y：int 1 2 3 4 5 6 7 8 9 10 ... 
＃ -  attr（*，.internal.selfref）=< externalptr> 
  
 附注：我通常喜欢使用 type.convert（） over  as.numeric（），以避免触发强制引入的一些案例。例如，
  x < -  c（1，4，NA，6）
 as.numeric（x）
＃[1] 1 4 NA 6 
＃警告消息：
＃强制引入的NAs 
 type.convert（x）
＃[1] 1 4 NA 6 
  
但当然可以使用 as.numeric（）。
 
 
 
 
 
   >此答案假设 data.table dev v1.9.5  
 
Is there a way for fread to mimic the behaviour of read.table whereby the class of the variable is set by the data that is read in. 

I have numeric data with a few comments underneath the main data. When i use fread to read in the data, the columns are converted to character. However, by setting the nrow in read.table` i can stop this behaviour. Is this possible in fread. (I would prefer not to alter the raw data or make an amended copy). Thanks

An example
d <- data.frame(x=c(1:100, NA, NA, "fff"), y=c(1:100, NA,NA,NA)) 
write.csv(d, "test.csv",  row.names=F)

in_d <- read.csv("test.csv", nrow=100, header=T)
in_dt <- data.table::fread("test.csv", nrow=100)
Which produces
> str(in_d)
'data.frame':   100 obs. of  2 variables:
 $ x: int  1 2 3 4 5 6 7 8 9 10 ...
 $ y: int  1 2 3 4 5 6 7 8 9 10 ...
> str(in_dt)
Classes ‘data.table’ and 'data.frame':  100 obs. of  2 variables:
 $ x: chr  "1" "2" "3" "4" ...
 $ y: int  1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, ".internal.selfref")=<externalptr>
As a workaround I thought i would be able to use read.table to read in one line, get the class and set the colClasses, but i am misunderstanding.
cl <- read.csv("test.csv", nrow=1,  header=T)
cols <- unname(sapply(cl, class))
in_dt <- data.table::fread("test.csv", nrow=100, colClasses=cols)
str(in_dt)
Using Windows8.1 
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
 解决方案 
Option 1: Using a system command

fread() allows the use of a system command in its first argument. We can use it to remove the quotes in the first column of the file.
indt <- data.table::fread("cat test.csv | tr -d '\"'", nrows = 100)
str(indt)
# Classes ‘data.table’ and 'data.frame':    100 obs. of  2 variables:
#  $ x: int  1 2 3 4 5 6 7 8 9 10 ...
#  $ y: int  1 2 3 4 5 6 7 8 9 10 ...
#  - attr(*, ".internal.selfref")=<externalptr> 
The system command cat test.csv | tr -d '\"' explained:


cat test.csv reads the file to standard output
| is a pipe, using the output of the previous command as input for the next command 
tr -d '\"' deletes (-d) all occurrences of double quotes ('\"') from the current input




Option 2: Coercion after reading

Since option 1 doesn't seem to be working on your system, another possibility is to read the file as you did, but convert the x column with type.convert().
library(data.table)
indt2 <- fread("test.csv", nrows = 100)[, x := type.convert(x)]
str(indt2)
# Classes ‘data.table’ and 'data.frame':    100 obs. of  2 variables:
#  $ x: int  1 2 3 4 5 6 7 8 9 10 ...
#  $ y: int  1 2 3 4 5 6 7 8 9 10 ...
#  - attr(*, ".internal.selfref")=<externalptr> 
Side note: I usually prefer to use type.convert() over as.numeric() to avoid the "NAs introduced by coercion" warning triggered in some cases.  For example,
x <- c("1", "4", "NA", "6")
as.numeric(x)
# [1]  1  4 NA  6
# Warning message:
# NAs introduced by coercion 
type.convert(x)
# [1]  1  4 NA  6
But of course you can use as.numeric() as well.



Note: This answer assumes data.table dev v1.9.5

                        这篇关于防止fread（）中的列类推断的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

防止fread（）中的列类推断 [英] Preventing column-class inference in fread()

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

防止fread（）中的列类推断 [英] Preventing column-class inference in fread()

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭