仅将文件中的选择列读入R的方法? (在read.table和scan之间是一种快乐的媒介?) [英] Ways to read only select columns from a file into R? (A happy medium between `read.table` and `scan`?)
问题描述
我有一些很大的定界数据文件,我只想处理R中的某些列,而又不花费时间和内存来创建 data.frame
表示整个文件。
I have some very big delimited data files and I want to process only certain columns in R without taking the time and memory to create a data.frame
for the whole file.
我知道的唯一选项是 read.table
,这非常浪费当我只想要几列或 scan
时,对于我想要的内容来说似乎太低了。
The only options I know of are read.table
which is very wasteful when I only want a couple of columns or scan
which seems too low level for what I want.
一个更好的选择,要么使用纯R,要么调用其他Shell脚本进行列提取,然后在其输出上使用scan或read.table? (这引出了一个问题,即如何调用Shell脚本并在R中捕获其输出?)。
Is there a better option, either with pure R or perhaps calling out to some other shell script to do the column extraction and then using scan or read.table on it's output? (Which leads to the question how to call a shell script and capture its output in R?).
推荐答案
有时我会做一些事情当我将数据保存在制表符分隔文件中时,如下所示:
Sometimes I do something like this when I have the data in a tab-delimited file:
df <- read.table(pipe("cut -f1,5,28 myFile.txt"))
这使得被剪切
进行数据选择,而无需使用任何内存。
That lets cut
do the data selection, which it can do without using much memory at all.
请参见对于纯R版本,只能读取有限的列数,在 colClasses中使用
自变量 NULL
read.table
。
这篇关于仅将文件中的选择列读入R的方法? (在read.table和scan之间是一种快乐的媒介?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!