具有多种条件的次数化数据帧 [英] Subseting dataframe with multiple conditions
问题描述
ARAP
,列名为 CoCd
和 VendorNo
。 我想要分组到另一个数据框中,名为
EMIU_EMIJ
所有行组合: CoCd =EMIJ& VendorNo =100010或
CoCd =EMIU& VendorNo =2000001或
CoCd =EMIU& VendorNo =2000006。
如何组合&和|选择满足两种组合的行?
I.e需要将 CoCd
和 VendorNo
组合在一起。
我试过
EMIU_EMIJ< -subset(ARAP,CoCd ==EMIJ& VendorNo ==100010
/ pre>
CoCd ==EMIU& VendorNo ==2000001|
CoCd ==EMIU& VendorNo ==2000006)
我也尝试了方括号
EMIU_EMIJ< -subset (CoCd ==EMIJ& VendorNo ==100010)|(CoCd ==EMIU& VendorNo ==2000001)|(CoCd ==EMIU& VendorNo ==2000006 ))
但这创建了一个错误:
错误:意外符号:EMIU_EMIJ
如何在上述3种组合中的1项进行子集?
解决方案一个简单的
合并
与all.y
例如,如果mydf是您的数据
set.seed (111)
mydf< - data.frame(id = rep(LETTERS,each = 4)[1:100],复制(3,samp le(1001,100)),Class = sample(c(Yes,No),100,TRUE))
mydf $ CoCd < - paste0(EMI,mydf $ id)
mydf $ VendorNo< - paste0(mydf $ X1,mydf $ X2)
mydf< - unique(mydf [,c(CoCd,VendorNo,Class,X3)] )
,看起来像这样
CoCd供应商没有类X3
1 EMIA 594577是727
2 EMIA 727137是921
3 EMIA 371939是123
4 EMIA 514176否950
5 EMIB 377818是668
6 EMIB 41713否85
7 EMIB 11637否579
8 EMIB 530266否212
9 EMIC 430566是241
10 EMIC 93958否533
11 EMIC 551197是176
12 EMIC 585686否565
13 EMID 67827是154
14 EMID 47894否469
15 EMID 155952否718
16 EMID 441649否835
17 EMIE 169541是945
18 EMIE 952871是452
19 EMIE 306441否358
20 EMIE 604730否9 20
21 EMIF 423407否868
22 EMIF 280668是658
23 EMIF 335907是830
24 EMIF 379620是841
25 EMIG 946644否471
,你想要组合
combine_to_select< -data.frame(CoCd = c(EMIA,EMID,EMIF),VendorNo = c('594577','47894','423407'),stringsAsFactors = FALSE)
combination_to_select
CoCd VendorNo
1 EMIA 594577
2 EMID 47894
3 EMIF 423407
以下代码为您提供子集
subset< - merge (mydf,combination_to_select,by = c(CoCd,VendorNo),all.y = TRUE)
CoCd VendorNo类X3
1 EMIA 594577是727
2 EMID 47894否469
3 EMIF 423407否868
Say I have a dataframe
ARAP
with columns calledCoCd
andVendorNo
. I want to subset into another dataframe calledEMIU_EMIJ
all lines for combinations of:CoCd="EMIJ" & VendorNo = "100010" or CoCd="EMIU" & VendorNo = "2000001" or CoCd="EMIU" & VendorNo = "2000006".
How do I combine & and | to select the lines where both combinations are met ? I.e. it needs to pair the
CoCd
andVendorNo
combinations together.I tried
EMIU_EMIJ<-subset(ARAP,CoCd=="EMIJ"&VendorNo=="100010"| CoCd=="EMIU"&VendorNo=="2000001"| CoCd=="EMIU"&VendorNo=="2000006")
I also tried brackets
EMIU_EMIJ<-subset(ARAP, (CoCd=="EMIJ"&VendorNo=="100010")|(CoCd=="EMIU"&VendorNo=="2000001")|(CoCd=="EMIU"&VendorNo=="2000006"))
But this created an error:
"Error: unexpected symbol in:"EMIU_EMIJ"
How do I subset for 1 of the 3 combinations mentioned above ?
解决方案a simple
merge
withall.y
option will do.for example if mydf is your data
set.seed(111) mydf <- data.frame(id=rep(LETTERS, each=4)[1:100], replicate(3, sample(1001, 100)),Class=sample(c("Yes", "No"), 100, TRUE)) mydf$CoCd <- paste0("EMI",mydf$id) mydf$VendorNo <- paste0(mydf$X1,mydf$X2) mydf <- unique(mydf[,c("CoCd","VendorNo","Class","X3")])
and looks like this
CoCd VendorNo Class X3 1 EMIA 594577 Yes 727 2 EMIA 727137 Yes 921 3 EMIA 371939 Yes 123 4 EMIA 514176 No 950 5 EMIB 377818 Yes 668 6 EMIB 41713 No 85 7 EMIB 11637 No 579 8 EMIB 530266 No 212 9 EMIC 430566 Yes 241 10 EMIC 93958 No 533 11 EMIC 551197 Yes 176 12 EMIC 585686 No 565 13 EMID 67827 Yes 154 14 EMID 47894 No 469 15 EMID 155952 No 718 16 EMID 441649 No 835 17 EMIE 169541 Yes 945 18 EMIE 952871 Yes 452 19 EMIE 306441 No 358 20 EMIE 604730 No 920 21 EMIF 423407 No 868 22 EMIF 280668 Yes 658 23 EMIF 335907 Yes 830 24 EMIF 379620 Yes 841 25 EMIG 946644 No 471
and you want the combinations
combination_to_select<-data.frame(CoCd=c("EMIA","EMID","EMIF"),VendorNo=c('594577','47894','423407'),stringsAsFactors=FALSE) combination_to_select CoCd VendorNo 1 EMIA 594577 2 EMID 47894 3 EMIF 423407
the following code gives you the subset
subset <- merge(mydf,combination_to_select,by=c("CoCd","VendorNo"),all.y=TRUE) CoCd VendorNo Class X3 1 EMIA 594577 Yes 727 2 EMID 47894 No 469 3 EMIF 423407 No 868
这篇关于具有多种条件的次数化数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!