pandas 中df.reindex()和df.set_index()方法之间的区别 [英] Difference between df.reindex() and df.set_index() methods in pandas

查看:844
本文介绍了 pandas 中df.reindex()和df.set_index()方法之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对此很困惑,这很简单,但是我没有立即在StackOverflow上找到答案:

I was confused by this, which is very simple but I didn't immediately find the answer on StackOverflow:

  • df.set_index('xcol')使列'xcol'成为索引(当它是df的列时).

  • df.set_index('xcol') makes the column 'xcol' become the index (when it is a column of df).

df.reindex(myList)从数据框外部获取索引,例如,从我们在其他地方定义的名为myList的列表中获取索引.

df.reindex(myList), however, takes indexes from outside the dataframe, for example, from a list named myList that we defined somewhere else.

我希望这篇文章能澄清它!也欢迎添加此帖子!

I hope this post clarifies it! Additions to this post are also welcome!

推荐答案

您可以在一个简单的示例中看到不同之处.让我们考虑以下数据帧:

You can see the difference on a simple example. Let's consider this dataframe:

df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)
   a  b
0  1  3
1  2  4

索引然后为0和1

如果将set_index与列'a'一起使用,则索引为1和2.如果执行df.set_index('a').loc[1,'b'],则将得到3.

If you use set_index with the column 'a' then the indexes are 1 and 2. If you do df.set_index('a').loc[1,'b'], you will get 3.

现在,如果要使用具有相同索引1和2(例如df.reindex([1,2]))的reindex,则执行df.reindex([1,2]).loc[1,'b']

Now if you want to use reindex with the same indexes 1 and 2 such as df.reindex([1,2]), you will get 4.0 when you do df.reindex([1,2]).loc[1,'b']

发生的事情是set_index用(1,2)(来自"a"列的值)替换了先前的索引(0,1),而没有触及"b"列中的值的顺序

What happend is that set_index has replaced the previous indexes (0,1) with (1,2) (values from column 'a') without touching the order of values in the column 'b'

df.set_index('a')
   b
a   
1  3
2  4

reindex更改索引,但将与原始df中的索引关联的列"b"中的值保留

while reindex change the indexes but keeps the values in column 'b' associated to the indexes in the original df

df.reindex(df.a.values).drop('a',1) # equivalent to df.reindex(df.a.values).drop('a',1)
     b
1  4.0
2  NaN
# drop('a',1) is just to not care about column a in my example

最后,reindex在不更改与每个索引相关联的行的值的情况下更改索引的顺序,而set_index在不触及索引中其他值的顺序的情况下将更改具有列值的索引.数据框

Finally, reindex change the order of indexes without changing the values of the row associated to each index, while set_index will change the indexes with the values of a column, without touching the order of the other values in the dataframe

这篇关于 pandas 中df.reindex()和df.set_index()方法之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆