pandas 中的rank方法中的ValueError没有更多解释 [英] ValueError in rank method in pandas without more explanation

查看：62 发布时间：2020/5/24 2:18:41 python pandas

本文介绍了 pandas 中的rank方法中的ValueError没有更多解释的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个这样的熊猫数据框:

I have a pandas Dataframe like this :

     year   week           city  avg_rank
0    2016     52          Paris         1
1    2016     52 Gif-sur-Yvette         2
2    2016     52          Paris         1
3    2017      1          Paris         4
4    2016     52          Paris         3
5    2016     52          Paris         5
6    2016     52          Paris         2

但是此代码行:

df['real_index']=df.groupby(by=['year', 'week', 'city']).avg_rank.rank(method='first')

生成该堆栈跟踪:

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in rank(self, axis, method, numeric_only, na_option, ascending, pct)

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
590                                                                 *args, **kwargs)
591                         except(AttributeError):
592                             raise ValueError
593
594             return wrapper

ValueError:

我的DataFrame的这些列中没有NaN值.

I have no NaN value in those columns of my DataFrame.

我正在将python2.7与pandas 0.18.1和numpy 1.11.0一起使用.

I am using python2.7 along with pandas 0.18.1 and numpy 1.11.0.

我的DataFrame的形状由大约9.000.000行和15列组成.

The shape of my DataFrame is consisting of about 9.000.000 rows and 15 columns.

更有趣的是，当我在DataFrame的所有子集中执行此代码行时(对于1.000.000行的每个子集)，我不会引发任何ValueError.

What is more intriguing is that when I execute this code line in all subsets of my DataFrame (for each subset of 1.000.000 rows), I don't raise any ValueError.

是pandas的已知行为不能很好地处理很大的DataFrame还是我错过了某些事情?

Is that a known behavior that pandas does not manage well quite big DataFrame or did I miss something ?

欢迎任何帮助！

推荐答案

由于我的DataFrame来自多个文件，因此我注意到某些索引已重复.

Since my DataFrame came from several files, I noticed that some indexes were duplicated.

使用

df.index = np.arange(df.shape[0])

加载数据后，它现在可以工作了.

just after loading the data, it now works.

的确，我的假设是在groupby中的某些组中有时存在具有相同索引的行.

Indeed, my hypothesis is that in some groups in the groupby there were sometimes rows with same indexing.

当我尝试使用DataFrame的子集时，这种情况幸运/不幸的是从未发生过.

When I tried with subsets of my DataFrame, this case fortunately/unfortunately never happened.

但是，错误消息并不十分详尽.

However, the error message is not very exhaustive.

这篇关于 pandas 中的rank方法中的ValueError没有更多解释的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 中的rank方法中的ValueError没有更多解释 [英] ValueError in rank method in pandas without more explanation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 中的rank方法中的ValueError没有更多解释 [英] ValueError in rank method in pandas without more explanation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭