如何在R中扩展大数据帧 [英] How to expand a large dataframe in R

查看：167 发布时间：2017/7/13 20:23:51 r plyr expand reshape dplyr

本文介绍了如何在R中扩展大数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框架

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4), 
  date = c("1985-06-19", "1985-06-19", "1985-06-19", "1985-08-01", 
           "1985-08-01", "1990-06-19", "1990-06-19", "1990-06-19", 
           "1990-06-19", "2000-05-12"), 
  spp = c("a", "b", "c", "c", "d", "b", "c", "d", "a", "b"),
  y = rpois(10, 5))

   id       date spp y
1   1 1985-06-19   a 6
2   1 1985-06-19   b 3
3   1 1985-06-19   c 7
4   2 1985-08-01   c 7
5   2 1985-08-01   d 6
6   3 1990-06-19   b 5
7   3 1990-06-19   c 4
8   3 1990-06-19   d 4
9   3 1990-06-19   a 6
10  4 2000-05-12   b 6

我想扩展它，以便id和spp的每一个组合，并且对于当前不在数据帧中的每个组合，都有 y = 0 。数据帧当前约为100,000行和15列。扩展时，它将是大约30万列（在我的实际数据集中有17个唯一值 spp ）。

I want to expand it so that there is every combination of id and spp and have y = 0 for every combination that is not currently in the dataframe. The dataframe is currently about 100,000 rows and 15 columns. When expanded it would be about 300,000 columns (there are 17 unique values of spp in my actual dataset).

对于 id 的每个值， date 是一样的（例如，当id = 2，date always = 1985-08- 01）。在我的真实数据集中，除 spp 和 y 之外的所有列都可以由 id 。

For every value of id the date is the same (e.g. when id = 2, date always = 1985-08-01). In my real dataset all the columns except spp and y can be specified by the id.

我想要结束如下：

   id       date spp y
   1 1985-06-19   a 6
   1 1985-06-19   b 3
   1 1985-06-19   c 7
   1 1985-06-19   d 0*
   2 1985-08-01   a 0*
   2 1985-08-01   b 0*
   2 1985-08-01   c 7
   2 1985-08-01   d 6
   3 1990-06-19   b 5
   3 1990-06-19   c 4
   3 1990-06-19   d 4
   3 1990-06-19   a 6
   4 2000-05-12   a 0*
   4 2000-05-12   b 6
   4 2000-05-12   c 0*
   4 2000-05-12   d 0*

指示添加的行

我可能会在未来做这个可能更大的数据帧，所以一个快速，高效（时间和内存）的方法来做到这一点将不胜感激，但任何解决方案都能满足我的需求。我想，应该有办法使用 dplyr ， data.table 或 reshape 包，但我不太熟悉任何一个。我不知道如果最容易扩展行id，spp和y，然后执行一个 left_join（）或 merge（） 根据 id 重组日期（以及真实数据框中的所有其他变量）

I will likely have to do this in the future with potentially much larger data frames, so a quick, efficient (time and memory) way to do this would be appreciated but any solution would satisfy me. I figure there should be ways to use the dplyr, data.table, or reshape packages but I'm not very familiar with any of them. I'm not sure if it would be easiest to expand just rows id, spp, and y, then do a left_join() or merge() to recombine date (and all the other variables in my real dataframe) based on id?

如何在R中扩展大数据帧 [英] How to expand a large dataframe in R

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何在R中扩展大数据帧 [英] How to expand a large dataframe in R

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭