通过R中的列变量进行数据分组和子分组 [英] Data grouping and sub-grouping by column variable in R

查看:617
本文介绍了通过R中的列变量进行数据分组和子分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过Win7上的 R 进行数据收集.

I am working on data collection by R on Win7.

给定的数据是:

  var1    var2   value

我需要按var1进行分组,然后为每个var1进行按var2分组.

I need to do grouping by var1 and then for each var1 , do grouping by var2.

然后,输出是与相同的var1和var2关联的值的列向量.在这里,var1和var2就像键.

Then, the output is column vectors of values that are associated with the same var1 and var2. Here, var1 and var2 are like keys.

示例

   var1    var2   value
   1          56       649578   
   2          17       357835
   1          88       572397
   2          90       357289
   1          56       427352   
   2          17       498455
   1          88       354623
   2          90       678658

结果应该是

   var1    var2   value
   1          56       649578   
   1          56       427352   
   1          88       354623
   1          88       572397
   2          17       357835
   2          17       498455
   2          90       357289
   2          90       678658

而且,我需要将CSV文件中的值打印为

And, I need to print the values in a CSV file as

对于var 1为1:

   649578   354623
   427352   572397

对于var 1为2:

  357835   357289
  498455   678658

而且,我还需要将CSV文件中的值打印为

And, I also need to print the values in a CSV file as

对于var 1 = 1:

For var 1 = 1:

   1          56       649578   
   1          56       427352   
   1          88       354623
   1          88       572397

对于var1 = 2:

For var1 = 2:

   2          17       357835
   2          17       498455
   2          90       357289
   2          90       678658

该怎么做?

我发现了一些帖子,这些帖子没有直接用处.

I found some posts, which are not directly useful.

更新: 如何选择和打印与每个唯一var2关联的值?

Update: How to choose and print the values that are associated with each unique var2 ?

R 中是否存在字典数据结构?

Are there dictionary data structure in R?

推荐答案

我相信,这与您要寻找的相对接近,但并不完全相同.它应该会提供一些帮助

This is relatively close to what you are looking for I believe, but not quite the same. It should provide some help though

library(reshape2)
library(plyr)

dat<-data.frame(var1=c(1,2,1,2,1,2,1,2),var2=c(56,17,88,90,56,17,88,90),value=c(649578,357835,572397,357289,427352,498455,354623,678658))

dat<-dat[order(dat$var1,dat$var2),]

dat<-ddply(dat,.(var1,var2),summarize,seq1=c(1:length(value)),value=value)

dat.new.new<-dcast(dat,var1+var2~seq1,value.var="value")

使用order()进行的第二次dat调用将根据您的请求对结果进行排序,并且dat.new.new数据框与您要查找的内容接近.

the second dat call using order() will order the results as you requested, and the dat.new.new data frame is close to what you were looking for.

用于获取KidCudi参考的奖励积分

bonus points for catching the KidCudi reference

这篇关于通过R中的列变量进行数据分组和子分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆