在 R 中拆分为两个类别 [英] Splitting by two categories in R

查看:45
本文介绍了在 R 中拆分为两个类别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

n <- 3
strata <- rep(1:4, each=n)
y <- rnorm(n =12)
x <- 1:12
category <- rep(c("A", "B", "C"), times = 4)
df <- cbind.data.frame(y, x, strata, category)

我想首先按strata"将我的数据拆分为一个列表,然后我想再次按category"拆分新列表中的所有数据框.最后,我想在每个结果数据框内对 x 上的 y 进行回归(在这种情况下,每个数据框将是一行,但在实际数据中,每个层的长度不同,层内的类别数也不同).

I want to first split my data into a list by "strata", and then I want to again split all the data frames inside the new list by "category". And finally I want to regress y on x inside each of the resulting data frames (in this case each data frame would be one row but in the actual data there are different lengths of each strata and a different number of categories inside strata).

推荐答案

R 中的规范方式是使用 split:

The canonical way in R is to use split:

L <- split(df, df[,c("strata","category")])
L
# $`1.A`
#           y x strata category
# 1 -1.120867 1      1        A
# $`2.A`
#           y x strata category
# 4 -1.023001 4      2        A
# $`3.A`
#           y x strata category
# 7 0.5411806 7      3        A
# $`4.A`
#           y  x strata category
# 10 1.546789 10      4        A
# $`1.B`
#           y x strata category
# 2 0.6730641 2      1        B
# $`2.B`
#           y x strata category
# 5 -1.466816 5      2        B
# $`3.B`
#            y x strata category
# 8 -0.1955617 8      3        B
# $`4.B`
#            y  x strata category
# 11 -0.660904 11      4        B
# $`1.C`
#            y x strata category
# 3 -0.9880206 3      1        C
# $`2.C`
#           y x strata category
# 6 0.4111802 6      2        C
# $`3.C`
#             y x strata category
# 9 -0.03311637 9      3        C
# $`4.C`
#            y  x strata category
# 12 0.6799109 12      4        C

12 元素列表的名称(此处)是两个分类变量的字符串连接,.-delimited;这很容易被覆盖(手动).

The names of the 12-element list (here) are the string-concatenation of the two categorical variables, .-delimited; this can easily be overridden (manually).

从这里开始,要对每个元素进行回归,您可能会执行以下操作:

From here, to do regression on every element, you'd likely do something like:

models <- lapply(L, function(x) lm(..., data=x))

(或您计划使用的任何回归工具).

(or whichever regression tool you are planning to use).

如果您愿意,可以一步完成,

You can do this in one step if you'd like,

results <- by(df, df[,c("strata","category")], function(x) lm(..., data=x))

好处是它一步完成.by 返回可能看起来有点奇怪,但它实际上只是一个带有一些特殊 print.bylist使用的方法;您仍然可以根据需要像列表一样引用它.

The benefit is that it does it in one step. The by return can look a bit odd, but it is really just a list with some special print.by methods being used; you can still reference it just like a list as needed.

dplyr 中执行此操作的另一种方法:

Another way to do this in dplyr:

library(dplyr)
results <- df %>%
  group_by(strata, category) %>%
  summarize(model = list(lm(y ~ x)))
results
# # A tibble: 12 x 3
# # Groups:   strata [4]
#    strata category model 
#     <int> <chr>    <list>
#  1      1 A        <lm>  
#  2      1 B        <lm>  
#  3      1 C        <lm>  
#  4      2 A        <lm>  
#  5      2 B        <lm>  
#  6      2 C        <lm>  
#  7      3 A        <lm>  
#  8      3 B        <lm>  
#  9      3 C        <lm>  
# 10      4 A        <lm>  
# 11      4 B        <lm>  
# 12      4 C        <lm>  
results$model[[1]]
# Call:
# lm(formula = y ~ x)
# Coefficients:
# (Intercept)            x  
#      -1.121           NA  

正如 Onyambu 所指出的(谢谢!),这很有效(没有 data=),因为我们明确列出了变量,它们会被找到.例如,如果您的回归使用 .,您可能希望使用

As pointed out by Onyambu (thank you!), this works well (without data=) because we are explicitly listing the variables, and they will be found. If your regression uses ., for example, you may want to formalize it a little with

results <- df %>%
  group_by(strata, category) %>%
  summarize(model = list(lm(y ~ ., data = cur_data())))

y~x 没有它也能工作,但 y~. 不行,所以 data=cur_data().

y~x will work without it, but y~. will not, ergo data=cur_data().

这篇关于在 R 中拆分为两个类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆