Pandas:如何从另一个数据框中获取出现次数? [英] Pandas: How to get count of occurrence from another data frame?
问题描述
我正在使用 Python Pandas.我有 2 个数据框(即:df1、df2).'df1' 包含标题级数据,如卡 ID、发行日期等. 'df2' 具有粒度级数据,例如由特定卡 ID 执行的每笔交易.'Card-id' 在两个数据帧之间很常见.
I am using Python Pandas. I have 2 data-frames (namely: df1, df2). 'df1' contains header-level data, like card-id, issued-on date etc. 'df2' has granular-level data, like each transaction performed by a specific card-id. 'Card-id' is common between the two dataframes.
df1:
first_active_month card_id feature_1 feature_2 feature_3
2017-06 C_ID_92a2005557 5 2 1
2017-01 C_ID_3d0044924f 4 1 0
2016-08 C_ID_d639edf6cd 2 2 0
2017-09 C_ID_186d6a6901 4 3 0
2017-11 C_ID_cdbd2c0db2 1 3 0
df2:
junk_id authorized_flag card_id city_id Authorized
13292136 Y C_ID_92a2005557 101 N
20069042 Y C_ID_7a238b3713 69 N
5029656 Y C_ID_92a2005557 17 N
16356907 N C_ID_3d0044924f -1 Y
8203441 Y C_ID_fcf33361c2 17 N
我想向 df1 添加一列频率",它将显示 df2 中 df1 的每个卡 ID 的出现次数.因此,df1 应如下所示:
I want to add a column "frequency" to df1 which will show me a count of occurrences of each card-id of df1 in df2. So, df1 should look like below:
df1 (after executing the command):
first_active_month card_id feature_1 feature_2 feature_3 frequency
2017-06 C_ID_92a2005557 5 2 1 2
2017-01 C_ID_3d0044924f 4 1 0 5
2016-08 C_ID_d639edf6cd 2 2 0 3
2017-09 C_ID_186d6a6901 4 3 0 1
2017-11 C_ID_cdbd2c0db2 1 3 0 7
请注意:我是 Python/Pandas 的新手.我已经浏览了这个站点的多个线程,但它们都指的是在同一个数据帧中计数.我正在寻找使用加入/合并功能的计数.我已经浏览过的主题:this、this、这个,这个,这个,这个,这个.
Please note: I am new to Python / Pandas. I have already gone through multiple threads of this site, but all of them referred to counting in the same data-frame. I am looking for a counting using join/merge functionality. Threads which I have already browsed: this, this, this, this, this, this, this.
推荐答案
我认为你需要 Series.map
和 Series.value_counts
和 Series.fillna
用于替换缺失值:
I think you need Series.map
with Series.value_counts
and Series.fillna
for replace missing values:
df1['frequency'] = df1['card_id'].map(df2['card_id'].value_counts()).fillna(0).astype(int)
print (df1)
first_active_month card_id feature_1 feature_2 feature_3 \
0 2017-06 C_ID_92a2005557 5 2 1
1 2017-01 C_ID_3d0044924f 4 1 0
2 2016-08 C_ID_d639edf6cd 2 2 0
3 2017-09 C_ID_186d6a6901 4 3 0
4 2017-11 C_ID_cdbd2c0db2 1 3 0
frequency
0 2
1 1
2 0
3 0
4 0
这篇关于Pandas:如何从另一个数据框中获取出现次数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!