编辑两个 pandas 列之间的距离 [英] Edit distance between two pandas columns

查看:98
本文介绍了编辑两个 pandas 列之间的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由两列字符串组成的pandas DataFrame.我想创建第三列,其中包含两列的编辑距离".

I have a pandas DataFrame consisting of two columns of strings. I would like to create a third column containing the Edit Distance of the two columns.

from nltk.metrics import edit_distance    
df['edit'] = edit_distance(df['column1'], df['column2'])

出于某种原因,这似乎进入了某种无限循环,从某种意义上说,它在相当长一段时间内都没有响应,然后我不得不手动终止它.

For some reason this seems to go to some sort of infinite loop in the sense that it remains unresponsive for quite some time and then I have to terminate it manually.

欢迎提出任何建议.

推荐答案

nltk的edit_distance函数用于比较字符串对.如果要计算相应的字符串对之间的编辑距离,请按apply分别将其与每一行的字符串分开,如下所示:

The nltk's edit_distance function is for comparing pairs of strings. If you want to compute the edit distance between corresponding pairs of strings, apply it separately to each row's strings like this:

results = df.apply(lambda x: edit_distance(x["column1"], x["column2"]), axis=1)

或者像这样(可能会更有效率),以避免包括数据框的无关列:

Or like this (probably a little more efficient), to avoid including the irrelevant columns of the dataframe:

results = df.loc[:, ["column1", "column2"]].apply(lambda x: edit_distance(*x), axis=1)

要将结果添加到数据框中,您将像这样使用它:

To add the results to your dataframe, you'd use it like this:

df["distance"] = df.loc[:, ["column1","column2"]].apply(lambda x: edit_distance(*x), axis=1)

这篇关于编辑两个 pandas 列之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆