从R中的大型.CSV导入和提取随机样本 [英] Importing and extracting a random sample from a large .CSV in R

查看：174 发布时间：2018/8/1 11:42:36 r csv import statistics subsampling

本文介绍了从R中的大型.CSV导入和提取随机样本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在R中进行一些分析，我需要处理一些大型数据集（10-20GB，存储在.csv中，并使用read.csv函数）。

I'm doing some analysis in R where I need to work with some large datasets (10-20GB, stored in .csv, and using the read.csv function).

由于我还需要将大型.csv文件与其他数据帧合并和转换，我没有计算能力或内存来导入整个文件。

As I will also need to merge and transform the large .csv files with other data frames, I don't have the computing power or memory to import the entire file.

我想知道是否有人知道如何导入随机百分比的csv。

I was wondering if anyone knows of a way to import a random percentage of the csv.

我有看到一些例子，人们已经导入了整个文件，然后使用一个单独的函数来创建另一个原始样本的数据框，但是我希望能有一些不那么密集的东西。

I have seen some examples where people have imported the entire file and then used a separate function to create another data frame that is a sample of the original, however I am hoping for something a little less intensive.

推荐答案

我认为没有一个好的R工具可以随机读取文件（也许它可以是一个扩展 read.table 或 fread （data.table包））。

I think that there is not a good R tool to read a file in a random way (maybe it can be an extension read.table or fread(data.table package)) .

使用 perl 您可以轻松完成此任务。例如，要以随机方式读取1％的文件，您可以这样做：

Using perl you can easily do this task. For example , to read 1% of your file in a random way, you can do this :

xx= system(paste("perl -ne 'print if (rand() < .01)'",big_file),intern=TRUE)

我在这里使用 system 从R调用它。 xx现在只包含1％的文件。

Here I am calling it from R using system. xx contain now only 1% of your file.

你可以将所有这些包装在一个函数中：

You can wrap all this in a function:

read_partial_rand <- 
  function(big_file,percent){
    cmd <- paste0("perl -ne 'print if (rand() < ",percent,")'")
    cmd <- paste(cmd,big_file)
    system(cmd,intern=TRUE)
  }

这篇关于从R中的大型.CSV导入和提取随机样本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从R中的大型.CSV导入和提取随机样本 [英] Importing and extracting a random sample from a large .CSV in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从R中的大型.CSV导入和提取随机样本 [英] Importing and extracting a random sample from a large .CSV in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭