为什么'reset_index(drop = True)'函数会不必要地删除列? [英] Why did 'reset_index(drop=True)' function unwantedly remove column?
问题描述
我有一个名为 data_match 的熊猫数据框.它包含列"_worker_id","_ unit_id"和标题". (有关此数据框中的某些行,请参见附件的屏幕截图)
I have a Pandas dataframe named data_match. It contains columns '_worker_id', '_unit_id', and 'caption'. (Please see attached screenshot for some of the rows in this dataframe)
比方说,索引列不是按升序排列(我希望索引为0、1、2、3、4 ... n),而我希望它按升序排列.因此,我运行以下函数尝试重置索引列:
data_match = data_match.reset_index(drop = True)
Let's say the index column is not in ascending order (I want the index to be 0, 1, 2, 3, 4...n) and I want it to be in ascending order. So I ran the following function attempting to reset the index column:
data_match=data_match.reset_index(drop=True)
我能够使用Python 3.6获得在我的计算机上返回正确输出的函数.但是,当我的同事使用Python 3.6在他的计算机上运行该功能时,"_ worker_id"列被删除了.
I was able to get the function to return the correct output in my computer using Python 3.6. However, when my coworker ran that function in his computer using Python 3.6, the '_worker_id' column got removed.
这是由于" reset_index "旁边的"(drop = True)"子句引起的吗?但是我不知道为什么它不能在我的计算机上工作,而不能在我的同事的计算机上工作.有人可以建议吗?
Is this due to the '(drop=True)' clause next to 'reset_index'? But I didn't know why it worked in my computer and not in my coworker's computer. Can anybody advise?
推荐答案
俗话说:您的口译员所发生的一切都留在您的 解释器".如果不查看该差异,就无法解释差异. 在两个Python交互式会话中输入命令的完整历史记录.
As the saying goes, "What happens in your interpreter stays in your interpreter". It's impossible to explain the discrepancy without seeing the full history of commands entered into both Python interactive sessions.
但是,可以冒险:
df.reset_index(drop=True)
删除DataFrame的当前索引并将其替换为索引
增加整数.它永远不会删除列.
df.reset_index(drop=True)
drops the current index of the DataFrame and replaces it with an index of
increasing integers. It never drops columns.
因此,在您的交互式会话中,_worker_id
是一列.在你同事的
交互式会话,_worker_id
必须是索引级别.
So, in your interactive session, _worker_id
was a column. In your co-worker's
interactive session, _worker_id
must have been an index level.
视觉上的差异可能有些微妙.例如,下面的df
具有一个
_worker_id
列,而df2
具有_worker_id
索引级别:
The visual difference can be somewhat subtle. For example, below, df
has a
_worker_id
column while df2
has a _worker_id
index level:
In [190]: df = pd.DataFrame({'foo':[1,2,3], '_worker_id':list('ABC')}); df
Out[190]:
_worker_id foo
0 A 1
1 B 2
2 C 3
In [191]: df2 = df.set_index('_worker_id', append=True); df2
Out[191]:
foo
_worker_id
0 A 1
1 B 2
2 C 3
请注意,当名称_worker_id
为foo
时,它会在foo
下方一行显示.
索引级别,并且当foo
是列时与foo
在同一行.那是唯一的
查看DataFrame的str
或repr
时得到的视觉提示.
Notice that the name _worker_id
appears one line below foo
when it is an
index level, and on the same line as foo
when it is a column. That is the only
visual clue you get when looking at the str
or repr
of a DataFrame.
因此重复:_worker_index
是列时,该列不受以下内容的影响
df.reset_index(drop=True)
:
So to repeat: When _worker_index
is a column, the column is unaffected by
df.reset_index(drop=True)
:
In [194]: df.reset_index(drop=True)
Out[194]:
_worker_id foo
0 A 1
1 B 2
2 C 3
但是_worker_index
当它是索引的一部分时被删除:
But _worker_index
is dropped when it is part of the index:
In [195]: df2.reset_index(drop=True)
Out[195]:
foo
0 1
1 2
2 3
这篇关于为什么'reset_index(drop = True)'函数会不必要地删除列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!