仅从 R 中的 .csv 文件导入每第 N 行 [英] Importing only every Nth row from a .csv file in R
问题描述
只是一个简单的问题.有没有办法使用 read.csv 从大文件中导入每第 N 行:
just a quick question. Is there a way to use read.csv to import every Nth row from a large file:
例如,一个 50-6000 万行的文件,您只需要从第 2 行开始的每 4 行.
Example, a 50-60 million line file where you only need every 4th row starting at row 2.
我想过可能会合并seq"功能,但我不确定这是否可行.
I thought about maybe incorporating the 'seq' function, but I am not sure if that is possible.
有什么建议吗?
推荐答案
对于大型数据文件,最好的选择是在将它们导入 R 之前过滤掉不必要的行.最简单的方法是通过操作系统命令,如 sed、awk、grep 等.以下代码从文件中每 4 行读取一次:例如:
For a large data file the best option is to filter out unnecessary row before they get imported into R. The simplest way to do this is by the means of the OS commands, like sed, awk, grep etc. The following code reads every 4th line from the file: for example:
write.csv(1:1000, file='test.csv')
file.pipe <- pipe("awk 'BEGIN{i=0}{i++;if (i%4==0) print $1}' < test.csv ")
res <- read.csv(file.pipe)
res
> res
X3 X3.1
1 7 7
2 11 11
3 15 15
4 19 19
5 23 23
6 27 27
7 31 31
8 35 35
这篇关于仅从 R 中的 .csv 文件导入每第 N 行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!