如何使用data.table :: fread读取未加引号的\ r [英] How to read unquoted extra \r with data.table::fread

查看:29
本文介绍了如何使用data.table :: fread读取未加引号的\ r的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要处理的数据带有一些附加的\ r字符的未加引号的文本.文件很大(500MB),数量很多(> 600),并且不能更改导出.数据可能看起来像

Data I have to process has unquoted text with some additional \r character. Files are big (500MB), copious (>600), and changing the export is not an option. Data might look like

A,B,C

等等,a,1

bloo,a \ r,b

bloo,a\r,b

blee,c,d

  1. 如何用data.table的 fread 处理?
  2. 是否有更好的R读取CSV函数,其性能类似?

Repro

library(data.table)
csv<-"A,B,C\r\n
      blah,a,1\r\n
      bloo,a\r,b\r\n
      blee,c,d\r\n"
fread(csv)

fread(csv)中的错误:当从点0开始检测类型时,应使用预期的sep(','),但换行,EOF(或其他非打印字符)在字段1处结束:布卢阿

Error in fread(csv) : Expected sep (',') but new line, EOF (or other non printing character) ends field 1 when detecting types from point 0: bloo,a

高级复制

简单的复制可能太琐碎而无法产生规模感...

Advanced repro

The simple repro might be too trivial to give a sense of scale...

samplerecs<-c("blah,a,1","bloo,a\r,b","blee,c,d")
randomcsv<-paste0(c("A,B,C",rep(samplerecs,2000000)))
write(randomcsv,file = "sample.csv")

# Naive approach
fread("sample.csv")

# Akrun's approach with needing text read first
fread(gsub("\r\n|\r", "", paste0(randomcsv,collapse="\r\n")))
#>Error in file.info(input) :  file name conversion problem -- name too long?

# Julia's approach with needing text read first
readr::read_csv(gsub("\r\n|\r", "", paste0(randomcsv,collapse="\r\n")))
#> Error: C stack usage  48029706 is too close to the limit

推荐答案

进一步@ dirk-eddelbuettel&@nrussell的建议,解决此问题的方法是对文件进行预处理.也可以在fread()中调用该处理器,但是在这里它是通过单独的步骤执行的:

Further to @dirk-eddelbuettel & @nrussell's suggestions, a way of solving this is to is to pre-process the file. The processor could also be called within fread() but here it is performed in seperate steps:

samplerecs<-c("blah,a,1","bloo,a\r,b","blee,c,d")
randomcsv<-paste0(c("A,B,C",rep(samplerecs,2000000)))
write(randomcsv,file = "sample.csv")
# Remove errant `\r`'s with tr - shown here is the Windows R solution
shell("C:/Rtools/bin/tr.exe -d '\\r' < sample.csv > sampleNEW.csv")
fread("sampleNEW.csv")

这篇关于如何使用data.table :: fread读取未加引号的\ r的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆