Pandas Dataframe:将具有列表的行扩展为具有所有列所需索引的多行 [英] Pandas Dataframe: Expand rows with lists to multiple row with desired indexing for all columns
问题描述
我在pandas数据框中有时间序列数据,在测量开始时的索引为时间,在列中以固定的采样率记录了值列表(连续索引的差异/列表中元素的数量)
I have time series data in pandas dataframe with index as time at the start of measurement and columns with list of values recorded at a fixed sampling rate (difference in consecutive index/number of elements in the list)
这是它的样子:
Time A B ....... Z
0 [1, 2, 3, 4] [1, 2, 3, 4]
2 [5, 6, 7, 8] [5, 6, 7, 8]
4 [9, 10, 11, 12] [9, 10, 11, 12]
6 [13, 14, 15, 16] [13, 14, 15, 16 ]
...
我想将所有列中的每一行扩展为多行,以便:
I want to expand each row in all the columns to multiple rows such that:
Time A B .... Z
0 1 1
0.5 2 2
1 3 3
1.5 4 4
2 5 5
2.5 6 6
.......
到目前为止,我正在按照以下思路考虑(代码无法正常运行):
So far I am thinking along these lines (code doesn't wok):
def expand_row(dstruc):
for i in range (len(dstruc)):
for j in range (1,len(dstruc[i])):
dstruc.loc[i+j/len(dstruc[i])] = dstruc[i][j]
dstruc.loc[i] = dstruc[i][0]
return dstruc
expanded = testdf.apply(expand_row)
我也尝试同时使用split(',')和stack(),但无法正确修复索引.
I also tried using split(',') and stack() together but I am not able to fix my indexing appropriately.
推荐答案
可能不理想,但这可以使用groupby
完成,并应用一个函数,该函数返回每一行的扩展DataFrame(此处假定时差为固定为2.0):
Probably not ideal, but this can be done using groupby
and apply a function which returns the expanded DataFrame for each row (here the time difference is assumed to be fixed at 2.0):
def expand(x):
data = {c: x[c].iloc[0] for c in x if c != 'Time'}
n = len(data['A'])
step = 2.0 / n;
data['Time'] = [x['Time'].iloc[0] + i*step for i in range(n)]
return pd.DataFrame(data)
print df.groupby('Time').apply(expand).set_index('Time', drop=True)
输出:
A B
Time
0.0 1 1
0.5 2 2
1.0 3 3
1.5 4 4
2.0 5 5
2.5 6 6
3.0 7 7
3.5 8 8
4.0 9 9
4.5 10 10
5.0 11 11
5.5 12 12
6.0 13 13
6.5 14 14
7.0 15 15
7.5 16 16
这篇关于Pandas Dataframe:将具有列表的行扩展为具有所有列所需索引的多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!