基于多列和多行条件扩展 R 数据框 [英] Expand R dataframe based on multiple column and row criteria

查看:43
本文介绍了基于多列和多行条件扩展 R 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R studio 中有以下数据框

I have the following dataframe in R studio

 DF1<-data.frame('X_F'=c(1,2,3,4,5, NA, NA, NA, 1,2,3,4,5), "X_A"=c(.1,.2,.3,.4,.5, NA, NA, NA, .2,.3,.4, .5,.6),"Y_F"=c(2,3,5,NA, 7, 1,3, 4, 1,NA,3,4,5), "Y_A"=c(.2,.3,.4,NA, .7, .1,.2,.7,.1,NA, .3,.4,.5),'ID'=c("A", "A", "A", "A", "A", "B", "B", "B", "C", "C", "C","C",'C'))

数据框由 5 列 - AN ID 列组成,用于标识每组和两组参数 - X_F、Y_F 和相应的一组 A 值 - X_A、Y_A.

The dataframe consists of 5 columns- AN ID column to identify each set and two sets of parameters- X_F, Y_F and a corresponding set of A values- X_A, Y_A.

数据框如下所示.

   X_F  X_A  Y_F  Y_A ID
   1    0.1   2   0.2  A
   2    0.2   3   0.3  A
   3    0.3   5   0.4  A
   4    0.4   NA  NA   A
   5    0.5   7   0.7  A
   NA   NA    1   0.1  B
   NA   NA    3   0.2  B
   NA   NA    4   0.7  B
   1   0.2    1   0.1  C
   2   0.3    NA  NA   C
   3   0.4    3   0.3  C
   4   0.5    4   0.4  C
   5   0.6    5   0.5  C

我想通过扩展上面的数据框来获得下面的数据框.扩展的数据框将有一个名为 SF 的额外列.SF的价值派生为一系列 X_F、Y_F 列,按 ID 分组.此范围由每个步骤的值 1 分隔

I want to obtain the following dataframe by expanding the above dataframe. The expanded dataframe will have an extra column called SF. The values of SF are derived as a range of X_F, Y_F columns, grouped by ID. this range is separated by a value of 1 for each step

     ID  SF   X_F  X_A   Y_F  Y_A
 1   A    1    1    0.1   1   NA
 2   A    2    2    0.2   2   0.2
 3   A    3    3    0.3   3   0.3
 4   A    4    4    0.4   4   NA
 5   A    5    5    0.5   5   0.4
 6   A    6    6    NA    6   NA
 7   A    7    7    NA    7   0.7
 8   B    1    1    NA    1   0.1
 9   B    2    2    NA    2   NA
 10  B    3    3    NA    3   0.2
 11  B    4    4    NA    4   0.7
 12  C    1    1    0.2   1   0.1
 13  C    2    2    0.3   2   NA
 14  C    3    3    0.4   3   0.3
 15  C    4    4    0.5   4   0.4
 16  C    5    5    0.6   5   0.5

我已经尝试过这种方法来获得所需的结果.

I have tried this approach to obtain the required result.

  library(dplyr)
  library(tidyr)
  DF1

    DF2<-DF1%>%group_by(ID)%>% mutate(SF=pmax(X_F, Y_F, na.rm = T))%>%
    complete(SF=(full_seq(SF ,1)))

与上面的预期输出相比,我得到了以下输出

I have got the following output as against the expected output above

   ID       SF   X_F   X_A   Y_F   Y_A
  <fct>   <dbl> <dbl> <dbl> <dbl> <dbl>
   A       2     1     0.1   2     0.2
   A       3     2     0.2   3     0.3
   A       4     4     0.4   NA     NA  
   A       5     3     0.3    5    0.4
   A       6    NA     NA    NA    NA  
   A       7     5     0.5   7     0.7
   B       1    NA     NA    1     0.1
   B       2    NA     NA    NA    NA  
   B       3    NA     NA    3     0.2
   B       4    NA     NA    4     0.7
   C       1     1     0.2   1     0.1
   C       2     2     0.3   NA    NA  
   C       3     3     0.4   3     0.3
   C       4     4     0.5   4     0.4
   C       5     5     0.6   5     0.5

我请人帮忙.我无法解决这个问题

I request someone to help. Am unable to solve this

推荐答案

complete中获取SFmax值并使用seq 而不是 full_seq 因为

Get max value of SF in complete and use seq instead of full_seq because

full_seq(2:4, 1) #gives
#[1] 2 3 4
#whereas
seq(max(2:4)) #gives
#[1] 1 2 3 4

那就试试吧

library(dplyr)
library(tidyr)

DF1 %>%
  group_by(ID) %>% 
  mutate(SF= pmax(X_F, Y_F, na.rm = T)) %>%
  complete(SF = seq(max(SF)))


#   ID       SF   X_F   X_A   Y_F   Y_A
#   <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A         1    NA  NA      NA  NA  
# 2 A         2     1   0.1     2   0.2
# 3 A         3     2   0.2     3   0.3
# 4 A         4     4   0.4    NA  NA  
# 5 A         5     3   0.3     5   0.4
# 6 A         6    NA  NA      NA  NA  
# 7 A         7     5   0.5     7   0.7
# 8 B         1    NA  NA       1   0.1
# 9 B         2    NA  NA      NA  NA  
#10 B         3    NA  NA       3   0.2
#11 B         4    NA  NA       4   0.7
#12 C         1     1   0.2     1   0.1
#13 C         2     2   0.3    NA  NA  
#14 C         3     3   0.4     3   0.3
#15 C         4     4   0.5     4   0.4
#16 C         5     5   0.6     5   0.5

<小时>

要使用 full_seq 获得预期输出,您可以在向量中添加 1


To get your expected output with full_seq you could add 1 in the vector

DF1 %>%
  group_by(ID) %>% 
  mutate(SF= pmax(X_F, Y_F, na.rm = T)) %>%
  complete(SF = full_seq(c(1, SF), 1))

这篇关于基于多列和多行条件扩展 R 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆