如何匹配和合并两个具有完全不同值的数据框(除了数据框列中的数字)? [英] How to match and merge two dataframes having completely different values except numericals in columns of dataframe?

查看:106
本文介绍了如何匹配和合并两个具有完全不同值的数据框(除了数据框列中的数字)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有值的数据框ABC

have a dataframe ABC of value

      id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you       | unknown
1     vbbngy     | txn of INR 191.00 using            | unknown
2     awerfa     | Rs.190.78 credits was used by you  | unknown
3     zxcmo5     | DLR.2000 credits was used by you   | unknown

和其他XYZ值

         price          |   type
0      190.78           | food
1      191.00           | movie
2      2,000            | football
3      1,599.00         | basketball

如何使用ABC映射XYZ,以便ABC中的类型使用值更新为xyz类型(数字)XYZ价格。

how to map XYZ with ABC ,so that type in ABC get updated with type in xyz using values(numericals) in price of XYZ .

我需要的输出

       id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you        | basketball
1     vbbngy     | txn of INR 191.00 using             | movie
2     awerfa     | Rs.190.78 credits was used by you   | food
3     zxcmo5     | DLR.2,000 credits was used by you| football

已使用此

d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))

pat = (r'({})'.format('|'.join(d.keys())))

ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)

但是像190.78和191.00这样的值不匹配。例如,在处理海量数据时,
应该与190.77之类的食物值匹配,例如190.77与为其分配了其他值的食物不匹配。 198.78也与其他应与食物匹配的食物不匹配

But values likes 190.78 and 191.00 are getting mismatched. for example while working with huge data 190.78 should be matched with food values like 190.77 gets mismatched with food where it has other value assigned to it. And 198.78 also gets mismatched with some other ones where it should match with food

推荐答案

df

        id                price                                type
0       easdca        Rs.1,599.00 was trasn by you          unknown
1       vbbngy        txn of INR 191.00 using               unknown
2       awerfa        Rs.190.78 credits was used by you     unknown
3       zxcmo5        DLR.2000 credits was used by you      unknown

df2

           price                   type
0        190.78                    food
1        191.00                   movie
2        2,000                 football
3        1,599.00            basketball

使用 re

df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')

更改浮动类型

df2['price_']  = df2['price_'].astype(float)
df['price_']  = df['price_'] .astype(float)

使用 pd.merge

df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)

输出

                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football

这篇关于如何匹配和合并两个具有完全不同值的数据框(除了数据框列中的数字)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆