根据索引值从较大的data.frame创建data.frame [英] creating a data.frame from a larger data.frame based on index values
问题描述
我下面有两个小的data.frames( d1
& d2
).在 d1
中,列 post
在各行中都不同( length(unique(d1 $ post))== 1L
给出 FALSE
).
I have two small data.frames (d1
&d2
) below. In d1
, the column post
varies across the rows (length(unique(d1$post)) == 1L
gives FALSE
).
从d1
,我想知道如何形成以下data.frame(总是,带有1
后缀的项目(例如mpre1
)是来自数据框的 control == F
子集&后缀为 2
的项(例如 mpre2
)来自 control == T
子集):
From d1
, I wonder how to form the following data.frame (ALWAYS, items with 1
suffix (ex. mpre1
) are from control==F
subset of the dataframe & items with 2
suffix (ex. mpre2
) are from control==T
subset):
# Desired output from `d1` (4 rows x 6 columns):
# mpre1 sdpre1 n1 mpre2 sdpre2 n2
#1 0.31 0.39 20 0.23 0.39 18 ##group=1,control=F&T,outcome=1
#2 3.54 1.21 20 3.08 1.57 18 ##group=1,control=F&T,outcome=2
#3 0.16 0.27 19 0.23 0.39 18 ##group=2,control=F&T,outcome=1
#4 2.85 1.99 19 3.08 1.57 18 ##group=2,control=F&T,outcome=2
在 d2
中, post
列在各行中没有变化( length(unique(d2 $ post))== 1L
给出 TRUE
).从 d2
,我想知道如何形成以下data.frame:
In d2
, the column post
does NOT vary across the rows (length(unique(d2$post)) == 1L
gives TRUE
). From d2
, I wonder how to form the following data.frame:
# Desired output from `d2`(4 rows x 6 columns):
# mpre1 sdpre1 n1 mpre2 sdpre2 n2
#1 81.6 10.8 73 80.50 11.20 80 ##group=1,control=F&T,outcome=1
#2 85.7 13.7 66 90.30 6.60 74 ##group=1,control=F&T,outcome=2
#3 81.4 10.9 72 80.50 11.20 80 ##group=2,control=F&T,outcome=1
#4 90.4 8.2 61 90.30 6.60 74 ##group=2,control=F&T,outcome=2
用于从 d1
或 d2
group & 成果
)>表示为(我的意思是 d1
放 d1
代表 d2
放 d2
):
The index values (for group
& outcome
) to extract the above vectors from either d1
or d2
are given by (I mean for d1
put d1
for d2
put d2
):
with(subset(d1,!control),rev(expand.grid(outcome=unique(outcome),group=unique(group))))
我有这些data.frames的列表,因此非常感谢功能性 BASE R 答案(下面是 d1
& d2
).
I have a list of these data.frames, thus a functional BASE R answer is highly appreciated (d1
& d2
are below).
(d1 = read.csv("https://raw.githubusercontent.com/rnorouzian/m2/main/g.csv"))
# study.name group n mpre sdpre mpos sdpos post control outcome
#1 Diab_a 1 20 0.31 0.39 0.02 0.06 1 FALSE 1
#2 Diab_a 1 20 0.31 0.39 0.05 0.08 2 FALSE 1
#3 Diab_a 1 20 3.54 1.21 1.38 0.89 1 FALSE 2
#4 Diab_a 1 20 3.54 1.21 1.38 0.55 2 FALSE 2
#5 Diab_a 2 19 0.16 0.27 0.12 0.19 1 FALSE 1
#6 Diab_a 2 19 0.16 0.27 0.03 0.06 2 FALSE 1
#7 Diab_a 2 19 2.85 1.99 1.22 0.43 1 FALSE 2
#8 Diab_a 2 19 2.85 1.99 1.94 1.12 2 FALSE 2
#9 Diab_a 3 18 0.23 0.39 0.07 0.12 1 TRUE 1
#10 Diab_a 3 18 0.23 0.39 0.06 0.09 2 TRUE 1
#11 Diab_a 3 18 3.08 1.57 1.53 0.64 1 TRUE 2
#12 Diab_a 3 18 3.08 1.57 1.93 0.61 2 TRUE 2
(d2 = read.csv("https://raw.githubusercontent.com/rnorouzian/m2/main/g2.csv"))
# study.name group n mpre sdpre mpos sdpos post control outcome
#1 Dlsk_Krlr 1 73 81.6 10.8 83.1 11.1 1 FALSE 1
#2 Dlsk_Krlr 1 66 85.7 13.7 88.8 10.5 1 FALSE 2
#3 Dlsk_Krlr 2 72 81.4 10.9 85.0 8.1 1 FALSE 1
#4 Dlsk_Krlr 2 61 90.4 8.2 91.2 7.6 1 FALSE 2
#5 Dlsk_Krlr 3 80 80.5 11.2 80.8 10.7 1 TRUE 1
#6 Dlsk_Krlr 3 74 90.3 6.6 89.6 6.3 1 TRUE 2
推荐答案
你想要这个吗?
library(tidyverse)
d1 %>% select( n, mpre, sdpre, control, outcome, post) %>%
unique %>%
mutate(control = control + 1) %>%
pivot_wider(values_from = c(mpre, sdpre, n), names_from = control, names_glue = '{.value}{control}',
values_fn = list) %>%
mutate(across(ends_with('1') | ends_with('2'), ~ifelse(post ==1, map_dbl(., first),
map_dbl(., last)))) %>%
arrange(post) %>%
select(ends_with('1'), ends_with('2'))
# A tibble: 4 x 6
mpre1 sdpre1 n1 mpre2 sdpre2 n2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.31 0.39 20 0.23 0.39 18
2 3.54 1.21 20 3.08 1.57 18
3 0.16 0.27 19 0.23 0.39 18
4 2.85 1.99 19 3.08 1.57 18
用于 d2
?
d2 %>% select(n, mpre, sdpre, control, outcome, post) %>%
mutate(control = control + 1) %>%
pivot_wider(values_from = c(mpre, sdpre, n), names_from = control,
names_glue = '{.value}{control}', values_fn = list) %>%
unnest(everything()) %>%
select(ends_with('1'), ends_with('2'))
# A tibble: 4 x 6
mpre1 sdpre1 n1 mpre2 sdpre2 n2
<dbl> <dbl> <int> <dbl> <dbl> <int>
1 81.6 10.8 73 80.5 11.2 80
2 81.4 10.9 72 80.5 11.2 80
3 85.7 13.7 66 90.3 6.6 74
4 90.4 8.2 61 90.3 6.6 74
如果 d2
中采用的策略接近您的预期,您也可以对`d1进行类似操作
If the strategy adopted in d2
is near to your expectation, you can do similar for `d1 also
d1 %>% select(n, mpre, sdpre, control, outcome, post) %>%
mutate(control = control + 1) %>%
pivot_wider(values_from = c(mpre, sdpre, n), names_from = control,
names_glue = '{.value}{control}', values_fn = list) %>%
unnest(everything()) %>%
select(ends_with('1'), ends_with('2')) %>% unique
# A tibble: 4 x 6
mpre1 sdpre1 n1 mpre2 sdpre2 n2
<dbl> <dbl> <int> <dbl> <dbl> <int>
1 0.31 0.39 20 0.23 0.39 18
2 0.16 0.27 19 0.23 0.39 18
3 3.54 1.21 20 3.08 1.57 18
4 2.85 1.99 19 3.08 1.57 18
这篇关于根据索引值从较大的data.frame创建data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!