从另一个DataFrame替换pandas.DataFrame中的值的优雅方法 [英] Elegant way to replace values in pandas.DataFrame from another DataFrame
问题描述
我有一个数据框,我想用另一数据框的值替换一列中的值.
I have a data frame that I want to replace the values in one column, with values from another dataframe.
df = pd.DataFrame({'id1': [1001,1002,1001,1003,1004,1005,1002,1006],
'value1': ["a","b","c","d","e","f","g","h"],
'value3': ["yes","no","yes","no","no","no","yes","no"]})
dfReplace = pd.DataFrame({'id2': [1001,1002],
'value2': ["rep1","rep2"]})
我需要使用具有公共密钥的groupby,当前解决方案是带有循环的.是否有一种更优雅(更快)的方法来使用.map(apply)等执行此操作.我想最初使用pd.update(),但似乎不是正确的方法.
I need to use a groupby with common key and current solution is with a loop. Is there a more elegant (faster) way to do this with .map(apply) etc. I wanted initial to use pd.update(), but doesn't seem the correct way.
groups = dfReplace.groupby(['id2'])
for key, group in groups:
df.loc[df['id1']==key,'value1']=group['value2'].values
输出
df
id1 value1 value3
0 1001 rep1 yes
1 1002 rep2 no
2 1001 rep1 yes
3 1003 d no
4 1004 e no
5 1005 f no
6 1002 rep2 yes
7 1006 h no
推荐答案
如果您已经将索引设置为id,那么这样做会更清洁一点,但是如果没有,您仍然可以在一行中完成:
This is a little cleaner if you already have the indexes set to id, but if not you can still do in one line:
>>> (dfReplace.set_index('id2').rename( columns = {'value2':'value1'} )
.combine_first(df.set_index('id1')))
value1 value3
1001 rep1 yes
1001 rep1 yes
1002 rep2 no
1002 rep2 yes
1003 d no
1004 e no
1005 f no
1006 h no
如果分成三行分别进行重命名和重新索引编制,您会发现combine_first()
本身实际上非常简单:
If you separate into three lines and do the renaming and re-indexing separately, you can see that the combine_first()
by itself is actually very simple:
>>> df = df.set_index('id1')
>>> dfReplace = dfReplace.set_index('id2').rename( columns={'value2':'value1'} )
>>> dfReplace.combine_first(df)
这篇关于从另一个DataFrame替换pandas.DataFrame中的值的优雅方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!