合并后有没有可用的_merge指示? [英] Is there a _merge indicator available after a merge?

查看:120
本文介绍了合并后有没有可用的_merge指示?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

dplyr 中合并后,有没有办法获得相当于 _merge 指标变量的方法?

Is there a way to get the equivalent of a _merge indicator variable after a merge in dplyr?

熊猫 指标= True 选项相似的东西,本质上告诉你如何合并(每个数据集有多少个匹配等)。

Something similar to Pandas' indicator = True option that essentially tells you how the merge went (how many matches from each dataset, etc).

以下是 Pandas

import pandas as pd

df1 = pd.DataFrame({'key1' : ['a','b','c'], 'v1' : [1,2,3]})
df2 = pd.DataFrame({'key1' : ['a','b','d'], 'v2' : [4,5,6]})

match = df1.merge(df2, how = 'left', indicator = True)

这里,在左侧加入之间 df1 df2 ,您要立即知道 df1 中有多少行在 df2 ,其中有多少没有

Here, after a left join between df1 and df2, you want to immediately know how many rows in df1 found a match in df2 and how many of them did not

match
Out[53]: 
  key1  v1   v2     _merge
0    a   1  4.0       both
1    b   2  5.0       both
2    c   3  NaN  left_only

我可以列表这个合并变量:

match._merge.value_counts()
Out[52]: 
both          2
left_only     1
right_only    0
Name: _merge, dtype: int64

我没有看到任何选项可用,比如说,左加入 dplyr

I don't see any option available after a, say, left join in dplyr

key1 = c('a','b','c')
v1 = c(1,2,3)
key2 = c('a','b','d')
v2 = c(4,5,6)
df1 = data.frame(key1,v1)
df2 = data.frame(key2,v2)

> left_join(df1,df2, by = c('key1' = 'key2'))
  key1 v1 v2
1    a  1  4
2    b  2  5
3    c  3 NA

我在这里遗漏了什么?
谢谢!

Am I missing something here? Thanks!

推荐答案

我们根据 inner_join anti_join 然后绑定 bind_rows

d1 <- inner_join(df1, df2, by = c('key1' = 'key2')) %>%
                    mutate(merge = "both")  
bind_rows(d1, anti_join(df1, df2, by = c('key1' = 'key2')) %>% 
             mutate(merge = 'left_only'))

这篇关于合并后有没有可用的_merge指示?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆