从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方法 [英] Elegant way to replace values in pandas.DataFrame from another DataFrame

查看:49
本文介绍了从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想用另一个数据框的值替换一列中的值.

df = pd.DataFrame({'id1': [1001,1002,1001,1003,1004,1005,1002,1006],'value1': ["a","b","c","d","e","f","g","h"],'value3': ["yes","no","yes","no","no","no","yes","no"]})dfReplace = pd.DataFrame({'id2': [1001,1002],'value2': ["rep1","re​​p2"]})

我需要使用带有公共键的 groupby,当前的解决方案是使用循环.有没有更优雅(更快)的方法来使用 .map(apply) 等.我想最初使用 pd.update(),但似乎不是正确的方法.

groups = dfReplace.groupby(['id2'])对于关键,分组:df.loc[df['id1']==key,'value1']=group['value2'].values

输出

dfid1 值 1 值 30 1001 rep1 是1 1002 rep2 否2 1001 rep1 是3 1003 d 无4 1004 e 否5 1005 f 无6 1002 rep2 是7 1006 小时 否

解决方案

如果您已经将索引设置为 id,这会更简洁一些,但如果没有,您仍然可以在一行中完成:

<预><代码>>>>(dfReplace.set_index('id2').rename( columns = {'value2':'value1'} ).combine_first(df.set_index('id1')))值 1 值 31001 rep1 是1001 rep1 是1002 rep2 否1002 rep2 是1003天无1004 没有1005 f 无1006 小时 无

如果分成三行,分别进行重命名和重新索引,可以看到combine_first()本身其实很简单:

<预><代码>>>>df = df.set_index('id1')>>>dfReplace = dfReplace.set_index('id2').rename( columns={'value2':'value1'} )>>>dfReplace.combine_first(df)

I have a data frame that I want to replace the values in one column, with values from another dataframe.

df = pd.DataFrame({'id1': [1001,1002,1001,1003,1004,1005,1002,1006],
                   'value1': ["a","b","c","d","e","f","g","h"],
                   'value3': ["yes","no","yes","no","no","no","yes","no"]})

dfReplace = pd.DataFrame({'id2': [1001,1002],
                   'value2': ["rep1","rep2"]})

I need to use a groupby with common key and current solution is with a loop. Is there a more elegant (faster) way to do this with .map(apply) etc. I wanted initial to use pd.update(), but doesn't seem the correct way.

groups = dfReplace.groupby(['id2'])

for key, group in groups:
    df.loc[df['id1']==key,'value1']=group['value2'].values

Output

df
    id1   value1 value3
0   1001  rep1   yes
1   1002  rep2   no
2   1001  rep1   yes
3   1003  d      no
4   1004  e      no
5   1005  f      no
6   1002  rep2   yes
7   1006  h      no

解决方案

This is a little cleaner if you already have the indexes set to id, but if not you can still do in one line:

>>> (dfReplace.set_index('id2').rename( columns = {'value2':'value1'} )
                               .combine_first(df.set_index('id1')))

     value1 value3
1001   rep1    yes
1001   rep1    yes
1002   rep2     no
1002   rep2    yes
1003      d     no
1004      e     no
1005      f     no
1006      h     no

If you separate into three lines and do the renaming and re-indexing separately, you can see that the combine_first() by itself is actually very simple:

>>> df = df.set_index('id1')
>>> dfReplace = dfReplace.set_index('id2').rename( columns={'value2':'value1'} )

>>> dfReplace.combine_first(df)

这篇关于从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆