具有多种条件的次数化数据帧 [英] Subseting dataframe with multiple conditions

查看:122
本文介绍了具有多种条件的次数化数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个数据框 ARAP ,列名为 CoCd VendorNo
我想要分组到另一个数据框中,名为 EMIU_EMIJ 所有行组合:

  CoCd =EMIJ& VendorNo =100010或
CoCd =EMIU& VendorNo =2000001或
CoCd =EMIU& VendorNo =2000006。

如何组合&和|选择满足两种组合的行?
I.e需要将 CoCd VendorNo 组合在一起。



我试过

  EMIU_EMIJ< -subset(ARAP,CoCd ==EMIJ& VendorNo ==100010 
CoCd ==EMIU& VendorNo ==2000001|
CoCd ==EMIU& VendorNo ==2000006)
/ pre>

我也尝试了方括号

  EMIU_EMIJ< -subset (CoCd ==EMIJ& VendorNo ==100010)|(CoCd ==EMIU& VendorNo ==2000001)|(CoCd ==EMIU& VendorNo ==2000006 ))

但这创建了一个错误:错误:意外符号:EMIU_EMIJ



如何在上述3种组合中的1项进行子集?

解决方案

一个简单的合并 all.y



例如,如果mydf是您的数据

  set.seed (111)
mydf< - data.frame(id = rep(LETTERS,each = 4)[1:100],复制(3,samp le(1001,100)),Class = sample(c(Yes,No),100,TRUE))
mydf $ CoCd < - paste0(EMI,mydf $ id)
mydf $ VendorNo< - paste0(mydf $ X1,mydf $ X2)
mydf< - unique(mydf [,c(CoCd,VendorNo,Class,X3)] )

,看起来像这样

  CoCd供应商没有类X3 
1 EMIA 594577是727
2 EMIA 727137是921
3 EMIA 371939是123
4 EMIA 514176否950
5 EMIB 377818是668
6 EMIB 41713否85
7 EMIB 11637否579
8 EMIB 530266否212
9 EMIC 430566是241
10 EMIC 93958否533
11 EMIC 551197是176
12 EMIC 585686否565
13 EMID 67827是154
14 EMID 47894否469
15 EMID 155952否718
16 EMID 441649否835
17 EMIE 169541是945
18 EMIE 952871是452
19 EMIE 306441否358
20 EMIE 604730否9 20
21 EMIF 423407否868
22 EMIF 280668是658
23 EMIF 335907是830
24 EMIF 379620是841
25 EMIG 946644否471

,你想要组合

  combine_to_select< -data.frame(CoCd = c(EMIA,EMID,EMIF),VendorNo = c('594577','47894','423407'),stringsAsFactors = FALSE)
combination_to_select

CoCd VendorNo
1 EMIA 594577
2 EMID 47894
3 EMIF 423407

以下代码为您提供子集

  subset<  -  merge (mydf,combination_to_select,by = c(CoCd,VendorNo),all.y = TRUE)
CoCd VendorNo类X3
1 EMIA 594577是727
2 EMID 47894否469
3 EMIF 423407否868


Say I have a dataframe ARAP with columns called CoCd and VendorNo. I want to subset into another dataframe called EMIU_EMIJ all lines for combinations of:

CoCd="EMIJ" & VendorNo = "100010" or
CoCd="EMIU" & VendorNo = "2000001" or
CoCd="EMIU" & VendorNo = "2000006".

How do I combine & and | to select the lines where both combinations are met ? I.e. it needs to pair the CoCd and VendorNo combinations together.

I tried

EMIU_EMIJ<-subset(ARAP,CoCd=="EMIJ"&VendorNo=="100010"|
CoCd=="EMIU"&VendorNo=="2000001"|
CoCd=="EMIU"&VendorNo=="2000006")

I also tried brackets

EMIU_EMIJ<-subset(ARAP, (CoCd=="EMIJ"&VendorNo=="100010")|(CoCd=="EMIU"&VendorNo=="2000001")|(CoCd=="EMIU"&VendorNo=="2000006"))

But this created an error:"Error: unexpected symbol in:"EMIU_EMIJ"

How do I subset for 1 of the 3 combinations mentioned above ?

解决方案

a simple merge with all.y option will do.

for example if mydf is your data

set.seed(111)
mydf <- data.frame(id=rep(LETTERS, each=4)[1:100], replicate(3, sample(1001, 100)),Class=sample(c("Yes", "No"), 100, TRUE))
mydf$CoCd <- paste0("EMI",mydf$id)
mydf$VendorNo <- paste0(mydf$X1,mydf$X2)
mydf <- unique(mydf[,c("CoCd","VendorNo","Class","X3")])

and looks like this

    CoCd VendorNo Class   X3
1   EMIA   594577   Yes  727
2   EMIA   727137   Yes  921
3   EMIA   371939   Yes  123
4   EMIA   514176    No  950
5   EMIB   377818   Yes  668
6   EMIB    41713    No   85
7   EMIB    11637    No  579
8   EMIB   530266    No  212
9   EMIC   430566   Yes  241
10  EMIC    93958    No  533
11  EMIC   551197   Yes  176
12  EMIC   585686    No  565
13  EMID    67827   Yes  154
14  EMID    47894    No  469
15  EMID   155952    No  718
16  EMID   441649    No  835
17  EMIE   169541   Yes  945
18  EMIE   952871   Yes  452
19  EMIE   306441    No  358
20  EMIE   604730    No  920
21  EMIF   423407    No  868
22  EMIF   280668   Yes  658
23  EMIF   335907   Yes  830
24  EMIF   379620   Yes  841
25  EMIG   946644    No  471

and you want the combinations

combination_to_select<-data.frame(CoCd=c("EMIA","EMID","EMIF"),VendorNo=c('594577','47894','423407'),stringsAsFactors=FALSE)
combination_to_select

  CoCd VendorNo
1 EMIA   594577
2 EMID    47894
3 EMIF   423407

the following code gives you the subset

subset <- merge(mydf,combination_to_select,by=c("CoCd","VendorNo"),all.y=TRUE)
  CoCd VendorNo Class  X3
1 EMIA   594577   Yes 727
2 EMID    47894    No 469
3 EMIF   423407    No 868

这篇关于具有多种条件的次数化数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆