匹配来自两个不同数据帧的键 [英] matching keys from two different dataframes
本文介绍了匹配来自两个不同数据帧的键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个数据框,
df1,
Name Stage Description key
0 Sri 1 Sri is one of the good singer in this two one
1 NaN 2 Thanks for reading two has
2 Ram 1 Ram is two of the good cricket player three
3 ganesh 1 one driver four
4 NaN 2 good buddies NaN
df2,
values
member of four
one of three friends
sri is a cricketer
Rahul has two brothers
如果密钥存在于df2.values中,我想用df2值替换df1 ["key"].
I want to replace the df1["key"] with df2 values, if the key is present in df2.values.
I tried, df1["key"]=df2[df2["values"].str.contains("|".join(df2["values"].tolist()),na=False)]
但是我得到的输出顺序相同,
But i am getting the output in the same order,
我想要
output_df,
Name Stage Description key
0 Sri 1 Sri is one of the good singer in this two one of three friends
1 NaN 2 Thanks for reading Rahul has two brothers
2 Ram 1 Ram is two of the good cricket player one of three friends
3 ganesh 1 one driver member of four
4 NaN 2 good buddies NaN
推荐答案
我将使用集合数组,并使用<=
进行子集测试和numpy广播.
I'll use arrays of sets and use <=
for subsetting testing and numpy broadcasting.
setify = lambda x: set(x.split())
v = df2['values'].values.astype(str)
k = df1['key'].values.astype(str)
i = df1.index
# These the sets
a = np.array([setify(x) for x in k.tolist()])
b = np.array([setify(x) for x in v.tolist()])
# This is the broadcasting
matches = (a[:, None] <= b)
# Additional testing that there exist any matches
any_ = matches.any(1)
# Test that wasn't null in the first place
nul_ = df1['key'].notnull().values
mask = any_ & nul_
# And argmax to find where the first set match is. There
# may be more than one match. I chose to use `assign`
# therefore I used `mask` to pass a slice of a series
# to target the correct rows.
df1.assign(key1=pd.Series(v[matches.argmax(1)], i)[mask])
Name Stage Description key key1
0 Sri 1 Sri is one of the good singer in this two one one of three friends
1 NaN 2 Thanks for reading two has Rahul has two brothers
2 Ram 1 Ram is two of the good cricket player three one of three friends
3 ganesh 1 one driver four member of four
4 NaN 2 good buddies NaN NaN
这篇关于匹配来自两个不同数据帧的键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文