根据索引值从较大的data.frame创建data.frame [英] creating a data.frame from a larger data.frame based on index values

查看:51
本文介绍了根据索引值从较大的data.frame创建data.frame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我下面有两个小的data.frames( d1 & d2 ).在 d1 中,列 post 在各行中都不同( length(unique(d1 $ post))== 1L 给出 FALSE ).

I have two small data.frames (d1&d2) below. In d1, the column post varies across the rows (length(unique(d1$post)) == 1L gives FALSE).

d1,我想知道如何形成以下data.frame(总是,带有1后缀的项目(例如mpre1)是来自数据框的 control == F 子集&后缀为 2 的项(例如 mpre2 )来自 control == T子集):

From d1, I wonder how to form the following data.frame (ALWAYS, items with 1 suffix (ex. mpre1) are from control==F subset of the dataframe & items with 2 suffix (ex. mpre2) are from control==T subset):

# Desired output from `d1` (4 rows x 6 columns):
#  mpre1 sdpre1 n1 mpre2 sdpre2 n2
#1  0.31   0.39 20  0.23   0.39 18 ##group=1,control=F&T,outcome=1 
#2  3.54   1.21 20  3.08   1.57 18 ##group=1,control=F&T,outcome=2
#3  0.16   0.27 19  0.23   0.39 18 ##group=2,control=F&T,outcome=1
#4  2.85   1.99 19  3.08   1.57 18 ##group=2,control=F&T,outcome=2

d2 中, post 列在各行中没有变化( length(unique(d2 $ post))== 1L 给出 TRUE ).从 d2 ,我想知道如何形成以下data.frame:

In d2, the column post does NOT vary across the rows (length(unique(d2$post)) == 1L gives TRUE). From d2, I wonder how to form the following data.frame:

# Desired output from `d2`(4 rows x 6 columns):
#  mpre1 sdpre1 n1 mpre2 sdpre2 n2
#1  81.6   10.8 73 80.50 11.20  80 ##group=1,control=F&T,outcome=1
#2  85.7   13.7 66 90.30  6.60  74 ##group=1,control=F&T,outcome=2
#3  81.4   10.9 72 80.50 11.20  80 ##group=2,control=F&T,outcome=1
#4  90.4    8.2 61 90.30  6.60  74 ##group=2,control=F&T,outcome=2

用于从 d1 d2 group & 成果)>表示为(我的意思是 d1 d1 代表 d2 d2 ):

The index values (for group & outcome) to extract the above vectors from either d1 or d2 are given by (I mean for d1 put d1 for d2 put d2):

with(subset(d1,!control),rev(expand.grid(outcome=unique(outcome),group=unique(group)))) 

我有这些data.frames的列表,因此非常感谢功能性 BASE R 答案(下面是 d1 & d2 ).

I have a list of these data.frames, thus a functional BASE R answer is highly appreciated (d1& d2 are below).

(d1 = read.csv("https://raw.githubusercontent.com/rnorouzian/m2/main/g.csv"))
#   study.name group  n mpre sdpre mpos sdpos post control outcome
#1      Diab_a     1 20 0.31  0.39 0.02  0.06    1   FALSE       1
#2      Diab_a     1 20 0.31  0.39 0.05  0.08    2   FALSE       1
#3      Diab_a     1 20 3.54  1.21 1.38  0.89    1   FALSE       2
#4      Diab_a     1 20 3.54  1.21 1.38  0.55    2   FALSE       2
#5      Diab_a     2 19 0.16  0.27 0.12  0.19    1   FALSE       1
#6      Diab_a     2 19 0.16  0.27 0.03  0.06    2   FALSE       1
#7      Diab_a     2 19 2.85  1.99 1.22  0.43    1   FALSE       2
#8      Diab_a     2 19 2.85  1.99 1.94  1.12    2   FALSE       2
#9      Diab_a     3 18 0.23  0.39 0.07  0.12    1    TRUE       1
#10     Diab_a     3 18 0.23  0.39 0.06  0.09    2    TRUE       1
#11     Diab_a     3 18 3.08  1.57 1.53  0.64    1    TRUE       2
#12     Diab_a     3 18 3.08  1.57 1.93  0.61    2    TRUE       2

(d2 = read.csv("https://raw.githubusercontent.com/rnorouzian/m2/main/g2.csv"))
#  study.name group  n mpre sdpre mpos sdpos post control outcome
#1  Dlsk_Krlr     1 73 81.6  10.8 83.1  11.1    1   FALSE       1
#2  Dlsk_Krlr     1 66 85.7  13.7 88.8  10.5    1   FALSE       2
#3  Dlsk_Krlr     2 72 81.4  10.9 85.0   8.1    1   FALSE       1
#4  Dlsk_Krlr     2 61 90.4   8.2 91.2   7.6    1   FALSE       2
#5  Dlsk_Krlr     3 80 80.5  11.2 80.8  10.7    1    TRUE       1
#6  Dlsk_Krlr     3 74 90.3   6.6 89.6   6.3    1    TRUE       2

推荐答案

你想要这个吗?

library(tidyverse)
d1 %>% select( n, mpre, sdpre, control, outcome, post) %>% 
  unique %>%
  mutate(control = control + 1) %>%
  pivot_wider(values_from = c(mpre, sdpre, n), names_from = control, names_glue = '{.value}{control}', 
              values_fn = list) %>%
  mutate(across(ends_with('1') | ends_with('2'), ~ifelse(post ==1, map_dbl(., first),
                                                         map_dbl(., last)))) %>%
  arrange(post) %>% 
  select(ends_with('1'), ends_with('2'))

# A tibble: 4 x 6
  mpre1 sdpre1    n1 mpre2 sdpre2    n2
  <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl>
1  0.31   0.39    20  0.23   0.39    18
2  3.54   1.21    20  3.08   1.57    18
3  0.16   0.27    19  0.23   0.39    18
4  2.85   1.99    19  3.08   1.57    18

用于 d2 ?

d2 %>% select(n, mpre, sdpre, control, outcome, post) %>% 
  mutate(control = control + 1) %>%
  pivot_wider(values_from = c(mpre, sdpre, n), names_from = control, 
              names_glue = '{.value}{control}', values_fn = list) %>%
  unnest(everything()) %>%
  select(ends_with('1'), ends_with('2'))

# A tibble: 4 x 6
  mpre1 sdpre1    n1 mpre2 sdpre2    n2
  <dbl>  <dbl> <int> <dbl>  <dbl> <int>
1  81.6   10.8    73  80.5   11.2    80
2  81.4   10.9    72  80.5   11.2    80
3  85.7   13.7    66  90.3    6.6    74
4  90.4    8.2    61  90.3    6.6    74

如果 d2 中采用的策略接近您的预期,您也可以对`d1进行类似操作

If the strategy adopted in d2 is near to your expectation, you can do similar for `d1 also

d1 %>% select(n, mpre, sdpre, control, outcome, post) %>% 
  mutate(control = control + 1) %>%
  pivot_wider(values_from = c(mpre, sdpre, n), names_from = control, 
              names_glue = '{.value}{control}', values_fn = list) %>%
  unnest(everything()) %>%
  select(ends_with('1'), ends_with('2')) %>% unique

# A tibble: 4 x 6
  mpre1 sdpre1    n1 mpre2 sdpre2    n2
  <dbl>  <dbl> <int> <dbl>  <dbl> <int>
1  0.31   0.39    20  0.23   0.39    18
2  0.16   0.27    19  0.23   0.39    18
3  3.54   1.21    20  3.08   1.57    18
4  2.85   1.99    19  3.08   1.57    18

这篇关于根据索引值从较大的data.frame创建data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆