如何匹配和合并两个具有完全不同值的数据框(除了数据框列中的数字)? [英] How to match and merge two dataframes having completely different values except numericals in columns of dataframe?
本文介绍了如何匹配和合并两个具有完全不同值的数据框(除了数据框列中的数字)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
具有值的数据框ABC
have a dataframe ABC of value
id | price | type
0 easdca | Rs.1,599.00 was trasn by you | unknown
1 vbbngy | txn of INR 191.00 using | unknown
2 awerfa | Rs.190.78 credits was used by you | unknown
3 zxcmo5 | DLR.2000 credits was used by you | unknown
和其他XYZ值
price | type
0 190.78 | food
1 191.00 | movie
2 2,000 | football
3 1,599.00 | basketball
如何使用ABC映射XYZ,以便ABC中的类型使用值更新为xyz类型(数字)XYZ价格。
how to map XYZ with ABC ,so that type in ABC get updated with type in xyz using values(numericals) in price of XYZ .
我需要的输出
id | price | type
0 easdca | Rs.1,599.00 was trasn by you | basketball
1 vbbngy | txn of INR 191.00 using | movie
2 awerfa | Rs.190.78 credits was used by you | food
3 zxcmo5 | DLR.2,000 credits was used by you| football
已使用此
d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))
pat = (r'({})'.format('|'.join(d.keys())))
ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)
但是像190.78和191.00这样的值不匹配。例如,在处理海量数据时,
应该与190.77之类的食物值匹配,例如190.77与为其分配了其他值的食物不匹配。 198.78也与其他应与食物匹配的食物不匹配
But values likes 190.78 and 191.00 are getting mismatched. for example while working with huge data 190.78 should be matched with food values like 190.77 gets mismatched with food where it has other value assigned to it. And 198.78 also gets mismatched with some other ones where it should match with food
推荐答案
df
id price type
0 easdca Rs.1,599.00 was trasn by you unknown
1 vbbngy txn of INR 191.00 using unknown
2 awerfa Rs.190.78 credits was used by you unknown
3 zxcmo5 DLR.2000 credits was used by you unknown
df2
price type
0 190.78 food
1 191.00 movie
2 2,000 football
3 1,599.00 basketball
使用 re
df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')
更改浮动类型
df2['price_'] = df2['price_'].astype(float)
df['price_'] = df['price_'] .astype(float)
使用 pd.merge
df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)
输出
id price price_ type_y
0 easdca Rs.1,599.00 was trasn by you 1599.00 basketball
1 vbbngy txn of INR 191.00 using 191.00 movie
2 awerfa Rs.190.78 credits was used by you 190.78 food
3 zxcmo5 DLR.2000 credits was used by you 2000 football
这篇关于如何匹配和合并两个具有完全不同值的数据框(除了数据框列中的数字)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文