为什么 pandas reindex()不就地运行? [英] Why doesn't pandas reindex() operate in-place?
问题描述
来自重新编制索引文件 :
使用可选的填充逻辑使DataFrame符合新索引,将NA/NaN放在上一个索引中没有值的位置.除非新索引等于当前索引并且copy = False,否则将生成一个新对象.
因此,我认为我可以通过在位置(!)设置copy=False
来重新排序Dataframe
.但是,看来我确实得到了一份副本,需要再次将其分配给原始对象.如果可以避免的话,我不想将其分配回来(原因来自另一个问题).>
这就是我在做什么:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(5, 5))
df.columns = [ 'a', 'b', 'c', 'd', 'e' ]
df.head()
出局:
a b c d e
0 0.234296 0.011235 0.664617 0.983243 0.177639
1 0.378308 0.659315 0.949093 0.872945 0.383024
2 0.976728 0.419274 0.993282 0.668539 0.970228
3 0.322936 0.555642 0.862659 0.134570 0.675897
4 0.167638 0.578831 0.141339 0.232592 0.976057
Reindex为我提供了正确的输出,但是我需要将其分配回原始对象,这是我想通过使用copy=False
来避免的事情:
df.reindex( columns=['e', 'd', 'c', 'b', 'a'], copy=False )
该行之后的期望输出是:
e d c b a
0 0.177639 0.983243 0.664617 0.011235 0.234296
1 0.383024 0.872945 0.949093 0.659315 0.378308
2 0.970228 0.668539 0.993282 0.419274 0.976728
3 0.675897 0.134570 0.862659 0.555642 0.322936
4 0.976057 0.232592 0.141339 0.578831 0.167638
为什么copy=False
无法正常工作?
有可能做到这一点吗?
使用python 3.5.3,pandas 0.23.3
reindex
是结构性更改,而不是修饰性或变革性更改.因此,总是返回一个副本,因为该操作无法就地完成(这将需要为基础数组分配新的内存,等等).这意味着您必须将结果分配回去,没有其他选择.
df = df.reindex(['e', 'd', 'c', 'b', 'a'], axis=1)
另请参阅 GH21598 上的讨论.
copy=False
实际上有任何用处的一个极端情况是,用于重新索引df
的索引与它已经具有的索引相同.您可以通过比较ID来进行检查:
id(df)
# 4839372504
id(df.reindex(df.index, copy=False)) # same object returned
# 4839372504
id(df.reindex(df.index, copy=True)) # new object created - ids are different
# 4839371608
From the reindex docs:
Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.
Therefore, I thought that I would get a reordered Dataframe
by setting copy=False
in place (!). It appears, however, that I do get a copy and need to assign it to the original object again. I don't want to assign it back, if I can avoid it (the reason comes from this other question).
This is what I am doing:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(5, 5))
df.columns = [ 'a', 'b', 'c', 'd', 'e' ]
df.head()
Outs:
a b c d e
0 0.234296 0.011235 0.664617 0.983243 0.177639
1 0.378308 0.659315 0.949093 0.872945 0.383024
2 0.976728 0.419274 0.993282 0.668539 0.970228
3 0.322936 0.555642 0.862659 0.134570 0.675897
4 0.167638 0.578831 0.141339 0.232592 0.976057
Reindex gives me the correct output, but I'd need to assign it back to the original object, which is what I wanted to avoid by using copy=False
:
df.reindex( columns=['e', 'd', 'c', 'b', 'a'], copy=False )
The desired output after that line is:
e d c b a
0 0.177639 0.983243 0.664617 0.011235 0.234296
1 0.383024 0.872945 0.949093 0.659315 0.378308
2 0.970228 0.668539 0.993282 0.419274 0.976728
3 0.675897 0.134570 0.862659 0.555642 0.322936
4 0.976057 0.232592 0.141339 0.578831 0.167638
Why is copy=False
not working in place?
Is it possible to do that at all?
Working with python 3.5.3, pandas 0.23.3
reindex
is a structural change, not a cosmetic or transformative one. As such, a copy is always returned because the operation cannot be done in-place (it would require allocating new memory for underlying arrays, etc). This means you have to assign the result back, there's no other choice.
df = df.reindex(['e', 'd', 'c', 'b', 'a'], axis=1)
Also see the discussion on GH21598.
The one corner case where copy=False
is actually of any use is when the indices used to reindex df
are identical to the ones it already has. You can check by comparing the ids:
id(df)
# 4839372504
id(df.reindex(df.index, copy=False)) # same object returned
# 4839372504
id(df.reindex(df.index, copy=True)) # new object created - ids are different
# 4839371608
这篇关于为什么 pandas reindex()不就地运行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!