从另一个数据框中替换列的值 [英] Replacing values of a column from another dataframe

查看:77
本文介绍了从另一个数据框中替换列的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您有一个包含10000+行的数据框,它看起来像这样-

Hi have a dataframe with 10000+ rows which looks like this -

df = pd.DataFrame([['110', 'Demand', 2344, 30953], 
                   ['111', 'Supply', 3535, 321312], 
                   ['112', 'Supply', 35345, 2324], 
                   ['113', 'Demand', 24345, 4542], 
                   ['114', 'Supply', 342, 435623]], 
                  columns=['Material', 'Title', '201950', '201951'])
df

Material    Title   201950  201951
110         Demand  2344    30953
111         Supply  3535    321312
112         Supply  35345   2324
113         Demand  24345   4542
114         Supply  342     435623

我还有另一个小数据框,大约有4-5行,看起来像这样-

I have another small dataframe with around 4-5 rows that looks like this -

extra = pd.DataFrame([['111', 'Supply', 10],
                     ['112', 'Supply', 20],
                     ['114', 'Supply', 30],
                     ['115', 'Supply', 40]],
                    columns=['Material', 'Title', '201950'])
extra
Material    Title   201950
111         Supply    10
112         Supply    20
114         Supply    30
115         Supply    40

我想在MaterialTitle匹配的地方使用extra中的值替换dfdf列中的值,以便结果数据帧看起来像这样-

I want to replace the values in column 201950 in df using values from extra wherever Material and Title match, so that the resultant dataframe looks like this-

Material    Title   201950  201951
110         Demand   2344   30953
111         Supply     10   321312
112         Supply     20   2324
113         Demand   24345  4542
114         Supply     30   435623

我确实尝试过合并

updated = df.merge(extra, how='left',
                       on=['Material', 'Title'],
                       suffixes=('', '_new'))
new = '201950_new'
updated['201950'] = np.where(pd.notnull(updated[new]), updated[new], updated['201950'])
updated.drop(new, axis=1, inplace=True)

这给了我所需的输出. 但我正在寻找一种更有效的解决方案.由于df很大,而extra只有4行.

This gives me the required output. But I am looking for a more efficient solution. Since the df is huge and extra has only 4 rows.

推荐答案

使用

Use DataFrame.update, but first create MultiIndex by Material and Title columns in both DataFrames:

df = df.set_index(['Material','Title'])
extra = extra.set_index(['Material','Title'])

df.update(extra)
df = df.astype(int).reset_index()
print (df)
  Material   Title  201950  201951
0      110  Demand    2344   30953
1      111  Supply      10  321312
2      112  Supply      20    2324
3      113  Demand   24345    4542
4      114  Supply      30  435623

这篇关于从另一个数据框中替换列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆