数据转换错误,同时将函数应用于pandas Python中的每一行 [英] Data Conversion Error while applying a function to each row in pandas Python

查看:631
本文介绍了数据转换错误,同时将函数应用于pandas Python中的每一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在python中有一个类似于这样的东西的熊猫数据框架 -

  contest_login_count contest_participation_count ipn_ratio 
0 1 1 0.000000
1 3 3 0.083333
2 3 3 0.000000
3 3 3 0.066667
4 5 13 0.102804
5 2 3 0.407407
6 1 3 0.000000
7 1 2 0.000000
8 53 91 0.264151
9 1 2 0.000000

现在我想对这个数据框的每一行应用一个函数这个函数写成这样 -

  def findCluster(clusterModel,data):
return clusterModel.predict(data)

我以这种方式将此函数应用于每一行 -

  df_fil.apply(lambda x:findCluster(cluster_all,x.reshape(1,-1)),axis = 1)

当我运行这段代码时,我得到一个警告 -


DataConversionWarning:输入dtype对象的数据已转换为float64。


$ b

warnings.warn(msg,DataConversionWarning)

此警告每行打印一次。因为我的数据框中有大约450K行,所以我的计算机在ipython笔记本上打印所有这些警告消息时挂起。



但是为了测试我的功能,我创建了一个虚拟数据框,并尝试应用相同的功能,它运作良好。这里是代码 -

  t = pd.DataFrame([[10.35,100.93,0.15],[10.35,100.93 ,0.15]])
t.apply(lambda x:findCluster(cluster_all,x.reshape(1,-1)),axis = 1)

对此的输出为 -

  0 1 2 
0 4 4 4
1 4 4 4

任何人都可以建议我做错了什么或者我可以改变什么使这个错误消失?

解决方案

我认为有问题 dtype 某些列不是 float



您需要将它转换为 astype

$ p $ df ['colname'] = df ['colname']。astype(float)


I have a data frame in pandas in python which resembles something like this -

    contest_login_count  contest_participation_count  ipn_ratio
0                    1                            1   0.000000
1                    3                            3   0.083333
2                    3                            3   0.000000
3                    3                            3   0.066667
4                    5                           13   0.102804
5                    2                            3   0.407407
6                    1                            3   0.000000
7                    1                            2   0.000000
8                   53                           91   0.264151
9                    1                            2   0.000000

Now I want to apply a function to each row of this dataframe The function is written as this -

def findCluster(clusterModel,data):
    return clusterModel.predict(data)

I apply this function to each row in this manner -

df_fil.apply(lambda x : findCluster(cluster_all,x.reshape(1,-1)),axis=1)

When I run this code, I get a warning saying -

DataConversionWarning: Data with input dtype object was converted to float64.

warnings.warn(msg, DataConversionWarning)

This warning is printed once for each row. Since, I have around 450K rows in my data frame, my computer hangs while printing all these warning messages that too on ipython notebook.

But to test my function I created a dummy dataframe and tried applying the same function on that and it works well. Here is the code for that -

t = pd.DataFrame([[10.35,100.93,0.15],[10.35,100.93,0.15]])
t.apply(lambda x:findCluster(cluster_all,x.reshape(1,-1)),axis=1)

The output to this is -

   0  1  2
0  4  4  4
1  4  4  4

Can anyone suggest what am I doing wrong or what can I change to make this error go away?

解决方案

I think there is problem dtype of some column is not float.

You need cast it by astype:

df['colname'] = df['colname'].astype(float)

这篇关于数据转换错误,同时将函数应用于pandas Python中的每一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆