pandas 中的rank方法中的ValueError没有更多解释 [英] ValueError in rank method in pandas without more explanation

查看:62
本文介绍了 pandas 中的rank方法中的ValueError没有更多解释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的熊猫数据框:

I have a pandas Dataframe like this :

     year   week           city  avg_rank
0    2016     52          Paris         1
1    2016     52 Gif-sur-Yvette         2
2    2016     52          Paris         1
3    2017      1          Paris         4
4    2016     52          Paris         3
5    2016     52          Paris         5
6    2016     52          Paris         2

但是此代码行:

df['real_index']=df.groupby(by=['year', 'week', 'city']).avg_rank.rank(method='first')

生成该堆栈跟踪:

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in rank(self, axis, method, numeric_only, na_option, ascending, pct)

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
590                                                                 *args, **kwargs)
591                         except(AttributeError):
592                             raise ValueError
593
594             return wrapper

ValueError:

我的DataFrame的这些列中没有NaN值.

I have no NaN value in those columns of my DataFrame.

我正在将python2.7pandas 0.18.1numpy 1.11.0一起使用.

I am using python2.7 along with pandas 0.18.1 and numpy 1.11.0.

我的DataFrame的形状由大约9.000.000行和15列组成.

The shape of my DataFrame is consisting of about 9.000.000 rows and 15 columns.

更有趣的是,当我在DataFrame的所有子集中执行此代码行时(对于1.000.000行的每个子集),我不会引发任何ValueError.

What is more intriguing is that when I execute this code line in all subsets of my DataFrame (for each subset of 1.000.000 rows), I don't raise any ValueError.

pandas的已知行为不能很好地处理很大的DataFrame还是我错过了某些事情?

Is that a known behavior that pandas does not manage well quite big DataFrame or did I miss something ?

欢迎任何帮助!

推荐答案

由于我的DataFrame来自多个文件,因此我注意到某些索引已重复.

Since my DataFrame came from several files, I noticed that some indexes were duplicated.

使用

df.index = np.arange(df.shape[0])

加载数据后,它现在可以工作了.

just after loading the data, it now works.

的确,我的假设是在groupby中的某些组中有时存在具有相同索引的行.

Indeed, my hypothesis is that in some groups in the groupby there were sometimes rows with same indexing.

当我尝试使用DataFrame的子集时,这种情况幸运/不幸的是从未发生过.

When I tried with subsets of my DataFrame, this case fortunately/unfortunately never happened.

但是,错误消息并不十分详尽.

However, the error message is not very exhaustive.

这篇关于 pandas 中的rank方法中的ValueError没有更多解释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆