R编程:read.csv()意外跳过线 [英] R Programming: read.csv() skips lines unexpectedly
问题描述
我尝试使用read.csv()读取R中的CSV文件(在linux下)。在函数完成后,我发现在R中读取的行数小于CSV文件中的行数(由wc -l获取)。此外,每次我读取该特定的CSV文件总是相同的行被跳过。我检查了CSV文件中的格式错误,但一切看起来不错。
I am trying to read a CSV file in R (under linux) using read.csv(). After the function gets completed I find that the number of lines read in R is less than the number of lines in CSV file (obtained by wc -l). Also, every time I read that specific CSV file always the same lines are getting skipped. I checked the formatting errors in CSV file but everything looks good.
但是如果我将被跳过的行提取到另一个CSV文件中,那么R就可以从该文件中读取很多行。
But if I extract the lines being skipped into another CSV file, then R is able to read very lines from that file.
我不能在任何地方找到我的问题可能是。
I am not able to find anywhere what my problem could be. Any help greatly appreciated.
推荐答案
这里有一个使用 count.fields
以确定在哪里查找和可能应用修订。您的宽度为23个字段的行数有限:
Here's an example of using count.fields
to determine where to look and perhaps apply fixes. You have a modest number of lines that are 23 'fields' in width:
> table(count.fields("~/Downloads/bugs.csv", quote="", sep=","))
2 23 30
502 10 136532
> table(count.fields("~/Downloads/bugs.csv", sep=","))
# Just wanted to see if removing quote-recognition would help.... It didn't.
2 4 10 12 20 22 23 25 28 30
11308 24 20 33 642 251 10 2 170 124584
> which(count.fields("~/Downloads/bugs.csv", quote="", sep=",") == 23)
[1] 104843 125158 127876 129734 130988 131456 132515 133048 136764
[10] 136765
我看过23:
txt <-readLines("~/Downloads/bugs.csv")[
which(count.fields("~/Downloads/bugs.csv", quote="", sep=",") == 23)]
> table(count.fields("~/Downloads/bugs.csv", quote="", sep=",", comment.char=""))
30
137044
所以...使用 read.table
中的设置你应该好去。
So.... use those settings in read.table
and you should be "good to go".
这篇关于R编程:read.csv()意外跳过线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!