pandas 多索引数据框:创建新索引或追加到现有索引 [英] Pandas multiple index dataframe: creating new index or appending to existing index
问题描述
我有一个熊猫数据框multi_df
,它具有由code
,colour
,texture
和shape
值组成的多索引,如下所示:
I have a Pandas dataframe, multi_df
, which has a multi-index made of the code
,colour
,texture
and shape
values as below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
'code' : ['one', 'one', 'two', 'three',
'two', 'three', 'one', 'two'],
'colour': ['black', 'white','white','white',
'black', 'black', 'white', 'white'],
'texture': ['soft', 'soft', 'hard','soft','hard',
'hard','hard','hard'],
'shape': ['round', 'triangular', 'triangular','triangular','square',
'triangular','round','triangular'],
'amount' : np.random.randn(8)}, columns= ['id','code','colour', 'texture', 'shape', 'amount'])
multi_df = df.set_index(['code','colour','texture','shape']).sort_index()['id']
multi_df
code colour texture shape
one black soft round 1
white hard round 7
soft triangular 2
three black hard triangular 6
white soft triangular 4
two black hard square 5
white hard triangular 3
triangular 8
Name: id, dtype: int64
我得到了new index
-new_id
对.如果multi_df
中已经存在new_index
(组合),我想将new_id
附加到现有索引中.如果new_index
不存在,我想创建它并添加id值.例如:
I am given a new index
- new_id
couple. If the new_index
(combination) already exists in the multi_df
, I want to append the new_id
to the existing index. If the new_index
does not exist, I want to create it and add the id value. For instance:
new_id = 15
new_index = ('two','white','hard', 'triangular')
if new_index in multi_df.index:
# APPEND TO EXISTING: multi_df[('two','white','hard', 'triangular')].append(new_id)
else:
# CREATE NEW index and put the new_id in.
但是,我无法弄清楚添加(IF)
或创建(ELSE)
新索引的语法.任何帮助将是最欢迎的.
However, I cannot figure out the syntax for appending (IF)
or creating (ELSE)
the new index. Any help would be most welcome.
P.S:要附加,我可以看到我要添加new_id
的对象是Series
.但是,append()不起作用.
P.S: for appending I can see that the object that I am trying to add the new_id
to is a Series
. However, append() does not work..
type(multi_df[('two','white','hard', 'triangular')])
<class 'pandas.core.series.Series'>
推荐答案
append()
每次都会创建一个新系列,因此它非常慢,如果需要在for循环中调用它:
append()
creates a new series every time, so it's very slow, if you need call this in a for loop:
data = pd.Series(15, index=pd.MultiIndex.from_tuples([('two','white','hard', 'triangular')]))
multi_df.append(data)
这篇关于 pandas 多索引数据框:创建新索引或追加到现有索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!