如果行x中的A列具有值,如何保持行x中的B列 [英] How to keep column B in row x if column A in row x has value

查看:74
本文介绍了如果行x中的A列具有值,如何保持行x中的B列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据框:

  ID COL01_A COL01_B COL02_A COL02_B ... COL12_A COL12_B 
1 01 19990101 03 20000101 ... FF
2 03 20170810 FA 20120303 ...
3 GG 19940508 DD 20000101 ... 03 20060808
4 03 20180101 09 20000101。 ..
5 GF 20171212 03 19990101 ... 02 20190101

列类型A中的值指示我是否正在寻找列类型B中的值。在这种情况下,关注的是值 03。此列有十二对。如从COL01_A / COL01_B到COL12_A / COL12_B的示例所示,我正在寻找一种生成新列的方法(称为COL_X),其中列的值仅当类型A的双列具有 03值时,才会反映类型B。对于上面显示的示例,所需的结果将是这样的。

  ID COL01_A COL01_B COL02_A COL02_B ... COL12_A COL12_B COL_X 
1 01 19990101 03 20000101 ... FF 20000101
2 03 20170810 FA 20120303 ... 20170810
3 GG 19940508 DD 20000101 ... 03 20060808 20060808
4 03 20180101 09 20000101 ... 20180101
5 GF 20171212 03 19990101 ... 02 20190101 19990101

现在,我已经使用残酷的长嵌套ifelse语句解决了我的问题,这不是完全可读的,也不是一个好习惯(我认为)。在效率方面,它的速度很快,但我想这仅仅是因为数据不是太庞大。我还使用 do.call(pmax(...))找到了另一个解决方案,但是该解决方案要求我清理数据帧(使用ifelse语句)并创建一个辅助数据框,每行包含所有其他信息。



是否有可能以最少的代码行和/或不使用辅助结构来完成此任务?如果解决方案使用data.table或dplyr,那就太好了。



可重现的基本示例:

  ID <-c(1,2,3,4,5)
数据<-c('xxx','yyy','zzz','xyz',' zxy')
COL01_A<-c('01','03','GG','03','GF')
COL01_B<-c('19990101','20170810',' 19940508','20180101','20171212')
COL02_A<-c('03','FA','DD','09','03')
COL02_B<-c(' 20000101','20120303','20000101','20000101','19990101')
COL03_A<-c('FF','','03','','02')
COL03_B<-c('','','20060808','','20190101')

df<-data.frame(ID,DATA,COL01_A,COL01_B,COL02_A,COL02_B, COL03_A,COL03_B)

如果有多个 03值,则COL_X应该具有

解决方案

我们可以找到 A B cols使用 grep ,然后使用 max.col 找出<$中值的行索引c $ c> A_cols 以 03作为值,然后从 B_cols

  A_cols<-grep( _ A $,名称(df))
B_cols<-grep( _ B $,names(df))
df $ COL_X<-df [B_cols] [cbind(1:nrow(df),max。 col(df [A_cols] == 03))]

df

#ID DATA COL01_A COL01_B COL02_A COL02_B COL03_A COL03_B COL_X
#1 1 xxx 01 19990101 03 20000101 FF 20000101
#2 2 yyy 03 20170810 FA 20120303 20170810
#3 3 zzz GG 19940508 DD 20000101 03 20060808 20060808
#4 4 xyz 03 20180101 09 20000101 20180101
#5 5 zxy GF 20171212 03 19990101 02 20190101 19990101

如果注释中的内容超过在特定行中输入1值 03,则我们需要一个空字符串作为输出。我们可以在上述条件之后为该条件添加一行,它应该可以正常工作。

  df $ COL_X <-ifelse(rowSums (df [A_cols] == 03)> 1,,df $ COL_X)


I have a dataframe like the following:

ID     COL01_A  COL01_B   COL02_A COL02_B  ... COL12_A  COL12_B
1      01       19990101  03      20000101 ... FF       ""
2      03       20170810  FA      20120303 ... ""       ""
3      GG       19940508  DD      20000101 ... 03       20060808
4      03       20180101  09      20000101 ... ""       ""
5      GF       20171212  03      19990101 ... 02       20190101

The values in the columns type A dictate wether the value in column type B is the one i'm looking for. In this case the interest is for values "03". There are twelve pairs of this columns. As seen in the example from COL01_A/COL01_B to COL12_A/COL12_B

I was looking for a way to generate a new column (lets call it COL_X) where the value of column type B is reflected only if it's twin column of type A has the "03" value. For the example presented above, the desired result would be something like this.

ID  COL01_A  COL01_B   COL02_A COL02_B  ... COL12_A  COL12_B   COL_X
1   01       19990101  03      20000101 ... FF       ""        20000101
2   03       20170810  FA      20120303 ... ""       ""        20170810     
3   GG       19940508  DD      20000101 ... 03       20060808  20060808
4   03       20180101  09      20000101 ... ""       ""        20180101  
5   GF       20171212  03      19990101 ... 02       20190101  19990101

Right now i've solved my problem using a brutally long nested ifelse statement, which is not exactly readable nor is it a good practice (in my opinion). In terms of efficiency, it's fast, but i suppose it's only because the data is not too massive. I also found another solution using do.call(pmax(...)), But this solution requires me to clean the data frame (using ifelse statements) and creating an auxiliary dataframe with all the other information per row.

Is there a way to accomplish this in the least lines of code possible and/or not using auxiliary structures? If the solution uses data.table or dplyr it would be great.

Basic reproducible example:

ID <- c(1,2,3,4,5)
DATA <- c('xxx', 'yyy', 'zzz','xyz','zxy')
COL01_A<- c('01','03','GG','03','GF')
COL01_B<- c('19990101','20170810','19940508','20180101','20171212')
COL02_A<- c('03','FA','DD','09','03')
COL02_B<- c('20000101','20120303','20000101','20000101','19990101')
COL03_A<- c('FF','','03','','02')
COL03_B<- c('','','20060808','','20190101')

df <- data.frame(ID, DATA, COL01_A,COL01_B,COL02_A,COL02_B,COL03_A,COL03_B)

if there are several "03" values the COL_X should have ""

解决方案

We can find out A and B cols using grep, then use max.col to find out row indices of values in A_cols having "03" as value then subset the corresponding value from B_cols.

A_cols <- grep("_A$", names(df))
B_cols <- grep("_B$", names(df))
df$COL_X <- df[B_cols][cbind(1:nrow(df), max.col(df[A_cols] == "03"))]

df

#  ID DATA COL01_A  COL01_B COL02_A  COL02_B COL03_A  COL03_B    COL_X
#1  1  xxx      01 19990101      03 20000101      FF          20000101
#2  2  yyy      03 20170810      FA 20120303                  20170810
#3  3  zzz      GG 19940508      DD 20000101      03 20060808 20060808
#4  4  xyz      03 20180101      09 20000101                  20180101
#5  5  zxy      GF 20171212      03 19990101      02 20190101 19990101

As updated in the comment if there are more than 1 value of "03" in a particular row then we want an empty string as output. We can add an additional line for that condition after the above and it should work.

df$COL_X <- ifelse(rowSums(df[A_cols] == "03") > 1, "", df$COL_X)

这篇关于如果行x中的A列具有值,如何保持行x中的B列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆