数据框应用不接受轴参数 [英] Dataframe apply doesn't accept axis argument
问题描述
我有两个数据帧: data
和 rules
.
>>>数据>>>规则供应商规则0 谷歌 0 谷歌1 谷歌 1 戴尔2 谷歌 2 macbook
在计算每个供应商和规则之间的 Levenshtein 相似度后,我尝试将两个新列添加到 data
数据框中.所以我的数据框最好包含如下所示的列:
>>>数据供应商规则相似性0 谷歌谷歌 0.8
到目前为止,我正在尝试执行一个 apply
函数,该函数将返回此结构,但数据框 apply 不接受 axis
参数.
有人可以帮我弄清楚我做错了什么吗?我所做的任何更改都只会产生新的错误.谢谢
您正在调用 apply
对于它没有 axis
arg 因此是错误.
如果你这样做了:
data[['rule','similarity']]=data[['vendor']].apply(lambda row:[r[0],ratio(row[0],r[0])],轴=1)
然后这会生成一个单列 df ,这将起作用
或者只是删除 axis
参数:
data[['rule','similarity']]=data['vendor'].apply(lambda row:[r[0],ratio(row[0],r[0])])
更新
看看您在做什么,您需要针对每个供应商计算每个规则的编辑比例.
您可以这样做:
data['vendor'].apply(lambda row: rules['rule'].apply(lambda x: ratio(x, row))
我认为应该根据每条规则计算每个供应商的比率.
I have two dataframes: data
and rules
.
>>>data >>>rules
vendor rule
0 googel 0 google
1 google 1 dell
2 googly 2 macbook
I am trying to add two new columns into the data
dataframe after computing the Levenshtein similarity between each vendor and rule. So my dataframe should ideally contain columns looking like this:
>>>data
vendor rule similarity
0 googel google 0.8
So far I am trying to perform an apply
function that will return me this structure, but the dataframe apply is not accepting the axis
argument.
>>> for index,r in rules.iterrows():
... data[['rule','similarity']]=data['vendor'].apply(lambda row:[r[0],ratio(row[0],r[0])],axis=1)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/mnnr/test/env/test-1.0/runtime/lib/python3.4/site-packages/pandas/core/series.py", line 2220, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/src/inference.pyx", line 1088, in pandas.lib.map_infer (pandas/lib.c:62658)
File "/home/mnnr/test/env/test-1.0/runtime/lib/python3.4/site-packages/pandas/core/series.py", line 2209, in <lambda>
f = lambda x: func(x, *args, **kwds)
TypeError: <lambda>() got an unexpected keyword argument 'axis'
Could someone please help me figure out what I am doing wrong? Any change I make is just creating new errors.Thank you
You're calling the Series
version of apply
for which it doesn't make sense to have an axis
arg hence the error.
If you did:
data[['rule','similarity']]=data[['vendor']].apply(lambda row:[r[0],ratio(row[0],r[0])],axis=1)
then this makes a single column df for which this would work
Or just remove the axis
arg:
data[['rule','similarity']]=data['vendor'].apply(lambda row:[r[0],ratio(row[0],r[0])])
update
Looking at what you're doing, you need to calculate the levenshtein ratio for each rule against every vendor.
You can do this by:
data['vendor'].apply(lambda row: rules['rule'].apply(lambda x: ratio(x, row))
this I think should calculate the ratio for each vendor against every rule.
这篇关于数据框应用不接受轴参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!