是否可以与python pandas进行模糊匹配合并? [英] is it possible to do fuzzy match merge with python pandas?

查看：711 发布时间：2020/5/23 21:11:14 python pandas

本文介绍了是否可以与python pandas进行模糊匹配合并?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个要基于列合并的DataFrame.但是，由于其他拼写方式，空格数量不同，不存在变音标记，只要它们彼此相似，我希望能够合并.

I have two DataFrames which I want to merge based on a column. However, due to alternate spellings, different number of spaces, absence/presence of diacritical marks, I would like to be able to merge as long as they are similar to one another.

任何相似性算法都可以使用(soundex，Levenshtein，difflib).

Any similarity algorithm will do (soundex, Levenshtein, difflib's).

假设一个DataFrame具有以下数据:

Say one DataFrame has the following data:

df1 = DataFrame([[1],[2],[3],[4],[5]], index=['one','two','three','four','five'], columns=['number'])

       number
one         1
two         2
three       3
four        4
five        5

df2 = DataFrame([['a'],['b'],['c'],['d'],['e']], index=['one','too','three','fours','five'], columns=['letter'])

      letter
one        a
too        b
three      c
fours      d
five       e

然后我要获取结果DataFrame

Then I want to get the resulting DataFrame

       number letter
one         1      a
two         2      b
three       3      c
four        4      d
five        5      e

推荐答案

类似于@locojay建议，您可以应用 difflib 的 get_close_matches 到df2的索引，然后应用

Similar to @locojay suggestion, you can apply difflib's get_close_matches to df2's index and then apply a join:

In [23]: import difflib 

In [24]: difflib.get_close_matches
Out[24]: <function difflib.get_close_matches>

In [25]: df2.index = df2.index.map(lambda x: difflib.get_close_matches(x, df1.index)[0])

In [26]: df2
Out[26]: 
      letter
one        a
two        b
three      c
four       d
five       e

In [31]: df1.join(df2)
Out[31]: 
       number letter
one         1      a
two         2      b
three       3      c
four        4      d
five        5      e

如果这些是列，则可以按照相同的方式应用于该列，然后

If these were columns, in the same vein you could apply to the column then merge:

df1 = DataFrame([[1,'one'],[2,'two'],[3,'three'],[4,'four'],[5,'five']], columns=['number', 'name'])
df2 = DataFrame([['a','one'],['b','too'],['c','three'],['d','fours'],['e','five']], columns=['letter', 'name'])

df2['name'] = df2['name'].apply(lambda x: difflib.get_close_matches(x, df1['name'])[0])
df1.merge(df2)

这篇关于是否可以与python pandas进行模糊匹配合并?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

是否可以与python pandas进行模糊匹配合并? [英] is it possible to do fuzzy match merge with python pandas?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

是否可以与python pandas进行模糊匹配合并? [英] is it possible to do fuzzy match merge with python pandas?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭