仅将文件中的选择列读入R的方法? (在read.table和scan之间是一种快乐的媒介?) [英] Ways to read only select columns from a file into R? (A happy medium between `read.table` and `scan`?)

查看:233
本文介绍了仅将文件中的选择列读入R的方法? (在read.table和scan之间是一种快乐的媒介?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些很大的定界数据文件,我只想处理R中的某些列,而又不花费时间和内存来创建 data.frame 表示整个文件。

I have some very big delimited data files and I want to process only certain columns in R without taking the time and memory to create a data.frame for the whole file.

我知道的唯一选项是 read.table ,这非常浪费当我只想要几列或 scan 时,对于我想要的内容来说似乎太低了。

The only options I know of are read.table which is very wasteful when I only want a couple of columns or scan which seems too low level for what I want.

一个更好的选择,要么使用纯R,要么调用其他Shell脚本进行列提取,然后在其输出上使用scan或read.table? (这引出了一个问题,即如何调用Shell脚本并在R中捕获其输出?)。

Is there a better option, either with pure R or perhaps calling out to some other shell script to do the column extraction and then using scan or read.table on it's output? (Which leads to the question how to call a shell script and capture its output in R?).

推荐答案

有时我会做一些事情当我将数据保存在制表符分隔文件中时,如下所示:

Sometimes I do something like this when I have the data in a tab-delimited file:

df <- read.table(pipe("cut -f1,5,28 myFile.txt"))

这使得被剪切进行数据选择,而无需使用任何内存。

That lets cut do the data selection, which it can do without using much memory at all.

请参见对于纯R版本,只能读取有限的列数,在 colClasses中使用 NULL 自变量 read.table

这篇关于仅将文件中的选择列读入R的方法? (在read.table和scan之间是一种快乐的媒介?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆