PANDAS vlookup针对使用地图的具有公共索引的系列 [英] PANDAS vlookup against series with common index using map

查看:96
本文介绍了PANDAS vlookup针对使用地图的具有公共索引的系列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import pandas as pd
import numpy as np

pb = {"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222"},"mark_up":{"0":1.2987,"1":1.5625,"2":1.3698,"3":1.3333,"4":1.4589}}

data = {"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73","5":"K74","6":"K75","7":"K79","8":"K86","9":"K100"},"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48,"5":9.48,"6":24.36,"7":5.16,"8":9.8,"9":3.28},"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222","5":"333","6":"444","7":"555","8":"666","9":"777"}}

pb = pd.DataFrame(data=pb).set_index('mark_up_id')
df = pd.DataFrame(data=data)

我知道我可以使用类似的东西:

I know that I can use something like:

df['mark_up_id'].map(pb['mark_up'])

执行v查找.我想对返回的收益进行加价,然后将其乘以每个成本并加上一个共同的索引,以产生一个称为价格的新列.

to perform a v-look-up. I'd like to take the mark-up this returns and multiply it by each cost with a common index to yield a new column called price.

我知道我可以将两者合并,然后运行计算.这就是我产生所需输出的方式.我希望能够做到这一点,类似于您遍历字典并使用键在另一本字典中查找值并在循环内执行某种计算的方式.考虑到PANDAS数据帧位于字典的顶部,必须有一种使用join/map/apply的组合来执行此操作的方法,而无需实际将两个数据集连接到内存中.

I know I can merge the two and then run the calculation. That's how I produced the desired output. I'd like to be able to do this similar to how you'd loop through a dictionary and use the keys to find values in another dictionary and perform some kind of computation inside of a loop. Considering PANDAS dataframes sit on top of dictionaries, there must be a way of using a combination of join/map/apply to do this without actually joining the two data-sets in memory.

所需的输出:

desired_output = {"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48},"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73"},"mark_up_id":{"0":"123","1":"456","2":"111","3":"123","4":"789"},"price":{"0":38.623338,"1":14.71875,"2":12.559686,"3":12.233754,"4":12.985704}}
do = pd.DataFrame(data=desired_output)

奖励积分:

解释接受的答案和...之间的区别

Explain the difference between the accepted answer and...

pb.loc[df['mark_up_id']]['mark_up'] * df.set_index('mark_up_id')['cost']

以及为什么我从上面导出的以下lambda函数遇到错误...

and why the following lambda function that i derived the above from hits an error...

df.apply(lambda x : x['cost']*pb.loc[x['mark_up_id']],axis=1 )

返回一条错误消息:

KeyError: ('the label [333] is not in the [index]', u'occurred at index 5')

推荐答案

尝试

df['price'] = df['mark_up_id'].map(pb['mark_up']) * df['cost']

你得到

    cost    id  mark_up_id  price
0   29.74   K69 123         38.623338
1   9.42    K70 456         14.718750
2   9.42    K71 111         12.559686
3   9.42    K72 123         12.233754
4   9.48    K73 789         12.985704

这篇关于PANDAS vlookup针对使用地图的具有公共索引的系列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆