使用Fuzzywuzzy在数据框中创建新列 [英] create new column in dataframe using fuzzywuzzy
问题描述
我在pandas
中有一个数据框,在这里我使用python中的fuzzywuzzy
包将数据框中的第一列与第二列进行匹配.
I have a dataframe in pandas
where I am using fuzzywuzzy
package in python to match first column in the dataframe with second column.
我已经定义了一个函数来创建具有第一列,第二列和部分比率得分的输出.但这不起作用.
I have defined a function to create an output with first column, second column and partial ratio score. But it is not working.
可以请你帮忙
import csv
import sys
import os
import numpy as np
import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
def match(driver):
driver["score"]=driver.apply(lambda row: fuzz.partial_ratio(row driver[driver.columns[0]], driver[driver.columns[1]]), axis=1)
print(driver)
return(driver)
致谢
-算盘
推荐答案
您已在apply
函数内部传递了一个可使用的Series,在此表示当前行.在您的代码中,您实际上是在忽略此Series,并尝试每次用DataFrame的两整列(driver[col]
)调用partial_ratio
.
You're passed a Series to work with inside the apply
function, representing the current row here. In your code, you're effectively ignoring this Series and trying to call partial_ratio
with the two whole columns of the DataFrame each time (driver[col]
).
对代码进行较小的更改有望为您提供所需的内容.
A minor change to your code should hopefully give you what you want.
d = DataFrame({'one': ['fuzz', 'wuzz'], 'two': ['fizz', 'woo']})
d.apply(lambda s: fuzz.partial_ratio(s['one'], s['two']), axis=1)
0 75
1 33
dtype: int64
(有趣的是,partial_ratio
函数将接受Series作为输入,但这仅是因为它在内部将其转换为字符串.:)
(Interestingly, the partial_ratio
function will accept a Series as input, but only because it converts it internally into a string. :)
这篇关于使用Fuzzywuzzy在数据框中创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!