R基于多个列值将数据帧分组成多个数据帧 [英] R subsetting a data frame into multiple data frames based on multiple column values

查看:151
本文介绍了R基于多个列值将数据帧分组成多个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对数据帧进行子集,根据多个列值我可以获取多个数据帧。这是我的例子

I am trying to subset a data frame, where I get multiple data frames based on multiple column values. Here is my example

>df
  v1   v2   v3   v4   v5
   A    Z    1    10   12
   D    Y    10   12    8
   E    X    2    12   15
   A    Z    1    10   12
   E    X    2    14   16

预期的输出是这样的,我根据列 v1 v2

The expected output is something like this where I am splitting this data frame into multiple data frames based on column v1 and v2

>df1
 v3   v4   v5
  1   10   12
  1   10   12
>df2
 v3   v4   v5
 10   12    8
>df3
 v3   v4   v5
 2    12   15
 2    14   16

我已经编写了一个正在工作的代码,但不认为这是最好的方法。必须有一个更好的方式来做到这一点。假设选项卡是具有初始数据的data.frame。这是我的代码:

I have written a code which is working right now but don't think that's the best way to do it. There must be a better way to do it. Assuming tab is the data.frame having the initial data. Here is my code:

v1Factors<-levels(factor(tab$v1))
v2Factors<-levels(factor(tab$v2))

for(i in 1:length(v1Factors)){
  for(j in 1:length(v2Factors)){
    subsetTab<-subset(tab, v1==v1Factors[i] & v2==v2Factors[j], select=c("v3", "v4", "v5"))
    print(subsetTab)
  }
}

有人可以建议一个更好的方法来执行上述操作吗?

Can someone suggest a better method to do the above?

推荐答案

您正在寻找 split

split(df, with(df, interaction(v1,v2)), drop = TRUE)
$E.X
  v1 v2 v3 v4 v5
3  E  X  2 12 15
5  E  X  2 14 16

$D.Y
  v1 v2 v3 v4 v5
2  D  Y 10 12  8

$A.Z
  v1 v2 v3 v4 v5
1  A  Z  1 10 12

如意见中所述

以下任何一项都可以使用

any of the following would work

library(microbenchmark)
microbenchmark(
                split(df, list(df$v1,df$v2), drop = TRUE), 
               split(df, interaction(df$v1,df$v2), drop = TRUE),
               split(df, with(df, interaction(v1,v2)), drop = TRUE))


Unit: microseconds
                                                  expr      min        lq    median       uq      max neval
            split(df, list(df$v1, df$v2), drop = TRUE) 1119.845 1129.3750 1145.8815 1182.119 3910.249   100
     split(df, interaction(df$v1, df$v2), drop = TRUE)  893.749  900.5720  909.8035  936.414 3617.038   100
 split(df, with(df, interaction(v1, v2)), drop = TRUE)  895.150  902.5705  909.8505  927.128 1399.284   100

出现交互稍快一些(可能是因为 f = list 。)只是转换为函数内的交互)

It appears interaction is slightly faster (probably due the fact that the f = list(...) are just converted to an interaction within the function)

编辑

如果你只是想使用子集data.frames,那么我就会生气d建议使用data.table方便编码

If you just want use the subset data.frames then I would suggest using data.table for ease of coding

library(data.table)

dt <- data.table(df)
dt[, plot(v4, v5), by = list(v1, v2)]

这篇关于R基于多个列值将数据帧分组成多个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆