具有多种条件的次数化数据帧 [英] Subseting dataframe with multiple conditions

查看：122 发布时间：2017/3/26 4:32:46 r dataframe subset

本文介绍了具有多种条件的次数化数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说我有一个数据框 ARAP ，列名为 CoCd 和 VendorNo 。
我想要分组到另一个数据框中，名为 EMIU_EMIJ 所有行组合：

  CoCd =EMIJ& VendorNo =100010或
 CoCd =EMIU& VendorNo =2000001或
 CoCd =EMIU& VendorNo =2000006。

如何组合&和|选择满足两种组合的行？
I.e需要将 CoCd 和 VendorNo 组合在一起。

我试过

  EMIU_EMIJ< -subset（ARAP，CoCd ==EMIJ& VendorNo ==100010 
 CoCd ==EMIU& VendorNo ==2000001| 
 CoCd ==EMIU& VendorNo ==2000006）
  / pre> 
 
 我也尝试了方括号
  EMIU_EMIJ< -subset （CoCd ==EMIJ& VendorNo ==100010）|（CoCd ==EMIU& VendorNo ==2000001）|（CoCd ==EMIU& VendorNo ==2000006 ））
  
但这创建了一个错误：错误：意外符号：EMIU_EMIJ 
 
 
 如何在上述3种组合中的1项进行子集？
解决方案
一个简单的合并与 all.y  
 
 
 例如，如果mydf是您的数据
  set.seed （111）
 mydf<  -  data.frame（id = rep（LETTERS，each = 4）[1：100]，复制（3，samp le（1001,100）），Class = sample（c（Yes，No），100，TRUE））
 mydf $ CoCd < -  paste0（EMI，mydf $ id）
 mydf $ VendorNo<  -  paste0（mydf $ X1，mydf $ X2）
 mydf<  -  unique（mydf [，c（CoCd，VendorNo，Class，X3）] ）
  
，看起来像这样
  CoCd供应商没有类X3 
 1 EMIA 594577是727 
 2 EMIA 727137是921 
 3 EMIA 371939是123 
 4 EMIA 514176否950 
 5 EMIB 377818是668 
 6 EMIB 41713否85 
 7 EMIB 11637否579 
 8 EMIB 530266否212 
 9 EMIC 430566是241 
 10 EMIC 93958否533 
 11 EMIC 551197是176 
 12 EMIC 585686否565 
 13 EMID 67827是154 
 14 EMID 47894否469 
 15 EMID 155952否718 
 16 EMID 441649否835 
 17 EMIE 169541是945 
 18 EMIE 952871是452 
 19 EMIE 306441否358 
 20 EMIE 604730否9 20 
 21 EMIF 423407否868 
 22 EMIF 280668是658 
 23 EMIF 335907是830 
 24 EMIF 379620是841 
 25 EMIG 946644否471 
  
，你想要组合
  combine_to_select< -data.frame（CoCd = c（EMIA，EMID，EMIF），VendorNo = c（'594577'，'47894'，'423407'），stringsAsFactors = FALSE）
 combination_to_select 
 
 CoCd VendorNo 
 1 EMIA 594577 
 2 EMID 47894 
 3 EMIF 423407 
  
以下代码为您提供子集
  subset<  -  merge （mydf，combination_to_select，by = c（CoCd，VendorNo），all.y = TRUE）
 CoCd VendorNo类X3 
 1 EMIA 594577是727 
 2 EMID 47894否469 
 3 EMIF 423407否868 
  
 
Say I have a dataframe ARAP with columns called CoCd and VendorNo.
I want to subset into another dataframe called EMIU_EMIJ all lines for combinations of:
CoCd="EMIJ" & VendorNo = "100010" or
CoCd="EMIU" & VendorNo = "2000001" or
CoCd="EMIU" & VendorNo = "2000006".
How do I combine & and | to select the lines where both combinations are met ?
I.e. it needs to pair the CoCd and VendorNo combinations together.

I tried
EMIU_EMIJ<-subset(ARAP,CoCd=="EMIJ"&VendorNo=="100010"|
CoCd=="EMIU"&VendorNo=="2000001"|
CoCd=="EMIU"&VendorNo=="2000006")
I also tried brackets
EMIU_EMIJ<-subset(ARAP, (CoCd=="EMIJ"&VendorNo=="100010")|(CoCd=="EMIU"&VendorNo=="2000001")|(CoCd=="EMIU"&VendorNo=="2000006"))
But this created an error:"Error: unexpected symbol in:"EMIU_EMIJ"

How do I subset for 1 of the 3 combinations mentioned above ?
 解决方案 
a simple merge with all.y option will do. 

for example if mydf is your data
set.seed(111)
mydf <- data.frame(id=rep(LETTERS, each=4)[1:100], replicate(3, sample(1001, 100)),Class=sample(c("Yes", "No"), 100, TRUE))
mydf$CoCd <- paste0("EMI",mydf$id)
mydf$VendorNo <- paste0(mydf$X1,mydf$X2)
mydf <- unique(mydf[,c("CoCd","VendorNo","Class","X3")])
and looks like this
    CoCd VendorNo Class   X3
1   EMIA   594577   Yes  727
2   EMIA   727137   Yes  921
3   EMIA   371939   Yes  123
4   EMIA   514176    No  950
5   EMIB   377818   Yes  668
6   EMIB    41713    No   85
7   EMIB    11637    No  579
8   EMIB   530266    No  212
9   EMIC   430566   Yes  241
10  EMIC    93958    No  533
11  EMIC   551197   Yes  176
12  EMIC   585686    No  565
13  EMID    67827   Yes  154
14  EMID    47894    No  469
15  EMID   155952    No  718
16  EMID   441649    No  835
17  EMIE   169541   Yes  945
18  EMIE   952871   Yes  452
19  EMIE   306441    No  358
20  EMIE   604730    No  920
21  EMIF   423407    No  868
22  EMIF   280668   Yes  658
23  EMIF   335907   Yes  830
24  EMIF   379620   Yes  841
25  EMIG   946644    No  471
and you want the combinations
combination_to_select<-data.frame(CoCd=c("EMIA","EMID","EMIF"),VendorNo=c('594577','47894','423407'),stringsAsFactors=FALSE)
combination_to_select

  CoCd VendorNo
1 EMIA   594577
2 EMID    47894
3 EMIF   423407
the following code gives you the subset
subset <- merge(mydf,combination_to_select,by=c("CoCd","VendorNo"),all.y=TRUE)
  CoCd VendorNo Class  X3
1 EMIA   594577   Yes 727
2 EMID    47894    No 469
3 EMIF   423407    No 868


                        
这篇关于具有多种条件的次数化数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

具有多种条件的次数化数据帧 [英] Subseting dataframe with multiple conditions

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

具有多种条件的次数化数据帧 [英] Subseting dataframe with multiple conditions

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭