pandas :如何将列中的多个列表分成多行? [英] Pandas: How do I split multiple lists in columns into multiple rows?
问题描述
我有一只熊猫DataFrame
,看起来像这样:
I have a pandas DataFrame
that looks like the following:
bus_uid bus_type type obj_uid \
0 biomass: DEB31 biomass output Simple_139804698384200
0 biomass: DEB31 biomass other duals
0 biomass: DEB31 biomass other excess
datetime \
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
values
0 [1.0, 2.0, 3.0, ...
0 [4.0, 5.0, 6.0, ...
0 [7.0, 8.0, 9.0, ...
并希望将其转换为以下格式:
And want to convert it into the following format:
bus_uid bus_type type obj_uid datetime values
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 00:00:00 1.0
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 01:00:00 2.0
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 02:00:00 3.0
0 biomass: DEB31 biomass other duals 2015-01-01 00:00:00 4.0
0 biomass: DEB31 biomass other duals 2015-01-01 01:00:00 5.0
0 biomass: DEB31 biomass other duals 2015-01-01 02:00:00 6.0
0 biomass: DEB31 biomass other excess 2015-01-01 00:00:00 7.0
0 biomass: DEB31 biomass other excess 2015-01-01 01:00:00 8.0
0 biomass: DEB31 biomass other excess 2015-01-01 02:00:00 9.0
列datetime
和values
具有相同的尺寸.
我已经在此处,但无法通过两列来解决我的问题.
I have already asked a similar question here but couldn't manage to apply the solution for my problem with two columns.
将DataFrame
转换为所需格式的最佳方法是什么?
What's the best way to convert the DataFrame
into the required format?
推荐答案
您可以遍历行以从单元格中提取Index
和Series
信息.当您需要同时提取信息时,我认为reshaping
方法效果不佳:
You could iterate through the rows to extract the Index
and Series
info from the cells. I don't think that reshaping
methods work well when you need to extract info at the same time:
样本数据:
rows = 3
df = pd.DataFrame(data={'bus_uid': list(repeat('biomass: DEB31', rows)), 'type': list(repeat('biomass', 3)), 'id': ['id1', 'id2', 'id3'], 'datetime': list(repeat(pd.DatetimeIndex(start=datetime(2016,1,1), periods=3, freq='D'), rows)), 'values': list(repeat([1,2,3], rows))})
bus_uid datetime id \
0 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id1
1 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id2
2 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id3
type values
0 biomass [1, 2, 3]
1 biomass [1, 2, 3]
2 biomass [1, 2, 3]
在遍历DataFrame
rows
时构建新的DataFrame
:
new_df = pd.DataFrame()
for index, cols in df.iterrows():
extract_df = pd.DataFrame.from_dict({'datetime': cols.ix['datetime'], 'values': cols.ix['values']})
extract_df = pd.concat([extract_df, cols.drop(['datetime', 'values']).to_frame().T], axis=1).fillna(method='ffill').fillna(method='bfill')
new_df = pd.concat([new_df, extract_df], ignore_index=True)
获得:
datetime values bus_uid id type
0 2016-01-01 1 biomass: DEB31 id1 biomass
1 2016-01-02 2 biomass: DEB31 id1 biomass
2 2016-01-03 3 biomass: DEB31 id1 biomass
3 2016-01-01 1 biomass: DEB31 id2 biomass
4 2016-01-02 2 biomass: DEB31 id2 biomass
5 2016-01-03 3 biomass: DEB31 id2 biomass
6 2016-01-01 1 biomass: DEB31 id3 biomass
7 2016-01-02 2 biomass: DEB31 id3 biomass
8 2016-01-03 3 biomass: DEB31 id3 biomass
这篇关于 pandas :如何将列中的多个列表分成多行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!