为什么 pandas reindex()不就地运行? [英] Why doesn't pandas reindex() operate in-place?

查看：108 发布时间：2020/5/24 2:32:29 python pandas dataframe reindex

本文介绍了为什么 pandas reindex()不就地运行?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用可选的填充逻辑使DataFrame符合新索引，将NA/NaN放在上一个索引中没有值的位置.除非新索引等于当前索引并且copy = False，否则将生成一个新对象.

因此，我认为我可以通过在位置(！)设置copy=False来重新排序Dataframe.但是，看来我确实得到了一份副本，需要再次将其分配给原始对象.如果可以避免的话，我不想将其分配回来(原因来自另一个问题).

这就是我在做什么:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(5, 5))

df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

df.head()

出局:

          a         b         c         d         e
0  0.234296  0.011235  0.664617  0.983243  0.177639
1  0.378308  0.659315  0.949093  0.872945  0.383024
2  0.976728  0.419274  0.993282  0.668539  0.970228
3  0.322936  0.555642  0.862659  0.134570  0.675897
4  0.167638  0.578831  0.141339  0.232592  0.976057

Reindex为我提供了正确的输出，但是我需要将其分配回原始对象，这是我想通过使用copy=False来避免的事情:

df.reindex( columns=['e', 'd', 'c', 'b', 'a'], copy=False )

该行之后的期望输出是:

          e         d         c         b         a
0  0.177639  0.983243  0.664617  0.011235  0.234296
1  0.383024  0.872945  0.949093  0.659315  0.378308
2  0.970228  0.668539  0.993282  0.419274  0.976728
3  0.675897  0.134570  0.862659  0.555642  0.322936
4  0.976057  0.232592  0.141339  0.578831  0.167638

为什么copy=False无法正常工作?

有可能做到这一点吗?

使用python 3.5.3，pandas 0.23.3

解决方案

reindex是结构性更改，而不是修饰性或变革性更改.因此，总是返回一个副本，因为该操作无法就地完成(这将需要为基础数组分配新的内存，等等).这意味着您必须将结果分配回去，没有其他选择.

df = df.reindex(['e', 'd', 'c', 'b', 'a'], axis=1)

另请参阅 GH21598 上的讨论.

copy=False实际上有任何用处的一个极端情况是，用于重新索引df的索引与它已经具有的索引相同.您可以通过比较ID来进行检查:

id(df)
# 4839372504

id(df.reindex(df.index, copy=False)) # same object returned 
# 4839372504

id(df.reindex(df.index, copy=True))  # new object created - ids are different
# 4839371608

From the reindex docs:

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Therefore, I thought that I would get a reordered Dataframe by setting copy=False in place (!). It appears, however, that I do get a copy and need to assign it to the original object again. I don't want to assign it back, if I can avoid it (the reason comes from this other question).

This is what I am doing:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(5, 5))

df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

df.head()

Outs:

          a         b         c         d         e
0  0.234296  0.011235  0.664617  0.983243  0.177639
1  0.378308  0.659315  0.949093  0.872945  0.383024
2  0.976728  0.419274  0.993282  0.668539  0.970228
3  0.322936  0.555642  0.862659  0.134570  0.675897
4  0.167638  0.578831  0.141339  0.232592  0.976057

Reindex gives me the correct output, but I'd need to assign it back to the original object, which is what I wanted to avoid by using copy=False:

df.reindex( columns=['e', 'd', 'c', 'b', 'a'], copy=False )

The desired output after that line is:

          e         d         c         b         a
0  0.177639  0.983243  0.664617  0.011235  0.234296
1  0.383024  0.872945  0.949093  0.659315  0.378308
2  0.970228  0.668539  0.993282  0.419274  0.976728
3  0.675897  0.134570  0.862659  0.555642  0.322936
4  0.976057  0.232592  0.141339  0.578831  0.167638

Why is copy=False not working in place?

Is it possible to do that at all?

Working with python 3.5.3, pandas 0.23.3

解决方案

reindex is a structural change, not a cosmetic or transformative one. As such, a copy is always returned because the operation cannot be done in-place (it would require allocating new memory for underlying arrays, etc). This means you have to assign the result back, there's no other choice.

df = df.reindex(['e', 'd', 'c', 'b', 'a'], axis=1)

Also see the discussion on GH21598.

The one corner case where copy=False is actually of any use is when the indices used to reindex df are identical to the ones it already has. You can check by comparing the ids:

id(df)
# 4839372504

id(df.reindex(df.index, copy=False)) # same object returned 
# 4839372504

id(df.reindex(df.index, copy=True))  # new object created - ids are different
# 4839371608

这篇关于为什么 pandas reindex()不就地运行?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么 pandas reindex()不就地运行? [英] Why doesn't pandas reindex() operate in-place?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么 pandas reindex()不就地运行? [英] Why doesn&#39;t pandas reindex() operate in-place?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

为什么 pandas reindex()不就地运行? [英] Why doesn't pandas reindex() operate in-place?

登录关闭