添加每组行的大 pandas / IPython中,如果每组行缺少 [英] Adding rows per group in pandas / ipython if per group a row is missing
问题描述
我有,对于每个组包含在一定期间观测的数目的数据帧。 有些群体不包含所有的句号,并为这些组我要的追加x行在它缺少的时期。
这样每个组有一排共6期
I have a dataframe that contains for each group the number of observations during a certain period. Some groups don't contain all periods, and for these groups I want to append x rows with the missing periods in it. So that each group has a row for all 6 periods
我现在的DF看起来是这样的:
My current df looks something like this:
> ID PERIOD VAlUE
1 1 10
1 2 8
1 3 8
1 4 15
1 5 6
1 6 44
2 1 NONE
3 2 4
3 5 25
我要一个数据帧寻找这样的。
I want a dataframe looking like this.
> ID PERIOD VAlUE
1 1 10
1 2 8
1 3 8
1 4 15
1 5 6
1 6 44
2 1 NONE
2 2 NONE
2 3 NONE
2 4 NONE
2 5 NONE
2 6 4
3 1 NONE
3 2 4
3 3 NONE
3 4 NONE
3 5 25
3 6 NONE
那么,什么happenend:
So what happenend:
- 对于ID == 1,什么都没有发生,因为它包含了所有的6个周期
- 有关的ID == 2,它所附5中,行的每个时期,它没有在第一自由度。有
- 有关的ID == 2,它所附4中,行的每个时期,它没有在第一自由度。有因此,它增加了对周期1,3,4和放行; 6。
我真的没有线索如何做到这一点,这样有利于真的会AP preciated。
I really don't have a clue how to do it, so help would really be appreciated.
推荐答案
您可以将索引设置为'ID'和'期间',然后通过产生两列的产品构造一个新的指数,通过这个新的索引到重新索引
,这有一个可选的 fill_value
参数,你可以设置为STR 无
:
You can set the index to 'ID' and 'PERIOD' and then construct a new index by generating the product of both columns and pass this as the new index to reindex
, this has an optional fill_value
param which you can set to the str NONE
:
In [158]:
iterables = [df['ID'].unique(),df['PERIOD'].unique()]
df = df.set_index(['ID','PERIOD'])
df = df.reindex(index=pd.MultiIndex.from_product(iterables, names=['ID', 'PERIOD']), fill_value='NONE').reset_index()
df
Out[158]:
ID PERIOD VAlUE
0 1 1 10
1 1 2 8
2 1 3 8
3 1 4 15
4 1 5 6
5 1 6 44
6 2 1 NONE
7 2 2 NONE
8 2 3 NONE
9 2 4 NONE
10 2 5 NONE
11 2 6 NONE
12 3 1 NONE
13 3 2 4
14 3 3 NONE
15 3 4 NONE
16 3 5 25
17 3 6 NONE
所以,打破了上往下:
So breaking the above down:
In [160]:
# create a list of the iterable index values we want to generate all product combinations from
iterables = [df['ID'].unique(),df['PERIOD'].unique()]
iterables
Out[160]:
[array([1, 2, 3], dtype=int64), array([1, 2, 3, 4, 5, 6], dtype=int64)]
In [163]:
# set the index to ID and PERIOD
df = df.set_index(['ID','PERIOD'])
df
Out[163]:
VAlUE
ID PERIOD
1 1 10
2 8
3 8
4 15
5 6
6 44
2 1 NONE
3 2 4
5 25
In [164]:
# reindex and pass the product from iterables as the new index
df.reindex(index=pd.MultiIndex.from_product(iterables, names=['ID', 'PERIOD']), fill_value='NONE').reset_index()
Out[164]:
ID PERIOD VAlUE
0 1 1 10
1 1 2 8
2 1 3 8
3 1 4 15
4 1 5 6
5 1 6 44
6 2 1 NONE
7 2 2 NONE
8 2 3 NONE
9 2 4 NONE
10 2 5 NONE
11 2 6 NONE
12 3 1 NONE
13 3 2 4
14 3 3 NONE
15 3 4 NONE
16 3 5 25
17 3 6 NONE
这篇关于添加每组行的大 pandas / IPython中,如果每组行缺少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!