使用Fuzzywuzzy在数据框中创建新列 [英] create new column in dataframe using fuzzywuzzy

查看:82
本文介绍了使用Fuzzywuzzy在数据框中创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pandas中有一个数据框,在这里我使用python中的fuzzywuzzy包将数据框中的第一列与第二列进行匹配.

I have a dataframe in pandas where I am using fuzzywuzzy package in python to match first column in the dataframe with second column.

我已经定义了一个函数来创建具有第一列,第二列和部分比率得分的输出.但这不起作用.

I have defined a function to create an output with first column, second column and partial ratio score. But it is not working.

可以请你帮忙

import csv
import sys
import os
import numpy as np
import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

def match(driver):
    driver["score"]=driver.apply(lambda row: fuzz.partial_ratio(row driver[driver.columns[0]], driver[driver.columns[1]]), axis=1)
    print(driver)
    return(driver)

致谢

-算盘

推荐答案

您已在apply函数内部传递了一个可使用的Series,在此表示当前行.在您的代码中,您实际上是在忽略此Series,并尝试每次用DataFrame的两整列(driver[col])调用partial_ratio.

You're passed a Series to work with inside the apply function, representing the current row here. In your code, you're effectively ignoring this Series and trying to call partial_ratio with the two whole columns of the DataFrame each time (driver[col]).

对代码进行较小的更改有望为您提供所需的内容.

A minor change to your code should hopefully give you what you want.

d = DataFrame({'one': ['fuzz', 'wuzz'], 'two': ['fizz', 'woo']})

d.apply(lambda s: fuzz.partial_ratio(s['one'], s['two']), axis=1)

0    75
1    33
dtype: int64

(有趣的是,partial_ratio函数将接受Series作为输入,但这仅是因为它在内部将其转换为字符串.:)

(Interestingly, the partial_ratio function will accept a Series as input, but only because it converts it internally into a string. :)

这篇关于使用Fuzzywuzzy在数据框中创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆