r-错误:处理fread(data.table)中的所有cols之后的文本 [英] r - Error: Text after processing all cols in fread (data.table)

查看：23 发布时间：2021/4/28 19:42:09 r data.table

本文介绍了r-错误:处理fread(data.table)中的所有cols之后的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试在R(3.4中导入文本文件.0)，其中实际上包含4列，但第4列在第200,000+行之前大部分为空.我在包data.table(ver 1.10.4)中使用了fread()

I tried to import a text file in R (3.4.0) which actually contains 4 columns but the 4th column is mostly empty until 200,000+th row. I use the fread() in package data.table (ver 1.10.4)

fread("test.txt",fill = TRUE, sep = "\t", quote = "", header = FALSE)

我收到此错误消息:

Error in fread("test.txt", fill = TRUE, sep = "\t", quote = "", header = FALSE) : 
Expecting 3 cols, but line 258088 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep='  ' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

我检查了文件，并在第4列("8-4")的第258088行中添加了其他文本.

I checked the file and there's additional text in 258088th row in the 4th column ("8-4").

尽管如此，fill = TRUE并没有解决我所期望的问题.我认为这可能是fread()不恰当地确定列号，因为附加列在文件中出现得很晚.所以我尝试了这个:

Nevertheless, fill = TRUE did not solve this as I expected. I thought it might be fread() determining column numbers inappropriately because the additional column occurs very late in the file. So I tried this:

fread("test.txt", fill = TRUE, header = FALSE, sep = "\t", skip = 250000)

错误仍然存在.另一方面，

The error persisted. On the other hand,

fread("test.txt", fill = TRUE, header = FALSE, sep = "\t", skip = 258080)

这没有错误.

我以为我找到了原因，但是当我使用

I thought I found the reason, but the weird thing happened when I tested with a dummy file generated by:

write.table(matrix(c(1:990000), nrow = 330000), "test2.txt", sep = "\t", row.names = FALSE)

在Excel的第250000行的第4列中添加"8-4".通过fread()读取时:

with the addition of a "8-4" in the 4th column of the 250000th row by Excel. When read by fread():

fread("test2.txt", fill = TRUE, header = FALSE, sep = "\t")

它工作正常，没有错误消息，这应该表明后面的一些附加列不一定会触发错误.

It worked fine with no error message, and this should indicate some late additional column not necessarily trigger error.

我也尝试更改编码("Latin-1"和"UTF-8")或引号，但均无济于事.

I also tried changing encoding ("Latin-1" and "UTF-8") or quote, but neither helped.

现在，我感到一无所知，并希望我可以利用可复制的信息来完成我的作业.谢谢您的帮助.

Now I feel clueless, and hopefully I did my homework enough with a reproducible information. Thank you for helping.

有关其他环境信息，我的sessionInfo()是:

For additional environmental info, my sessionInfo() is:

R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] dplyr_0.5.0            purrr_0.2.2.2          readr_1.1.1            tidyr_0.6.3           
  [5] tibble_1.3.3           ggplot2_2.2.1          tidyverse_1.1.1        stringr_1.2.0         
  [9] microbenchmark_1.4-2.1 data.table_1.10.4     

loaded via a namespace (and not attached):
[1] Rcpp_0.12.11     cellranger_1.1.0 compiler_3.4.0   plyr_1.8.4       forcats_0.2.0   
[6] tools_3.4.0      jsonlite_1.5     lubridate_1.6.0  nlme_3.1-131     gtable_0.2.0    
[11] lattice_0.20-35  rlang_0.1.1      psych_1.7.5      DBI_0.6-1        parallel_3.4.0  
[16] haven_1.0.0      xml2_1.1.1       httr_1.2.1       hms_0.3          grid_3.4.0      
[21] R6_2.2.1         readxl_1.0.0     foreign_0.8-68   reshape2_1.4.2   modelr_0.1.0    
[26] magrittr_1.5     scales_0.4.1     rvest_0.3.2      assertthat_0.2.0 mnormt_1.5-5    
[31] colorspace_1.3-2 stringi_1.1.5    lazyeval_0.2.0   munsell_0.4.3    broom_0.4.2

r-错误:处理fread(data.table)中的所有cols之后的文本 [英] r - Error: Text after processing all cols in fread (data.table)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

r-错误:处理fread(data.table)中的所有cols之后的文本 [英] r - Error: Text after processing all cols in fread (data.table)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭