在Pandas DataFrame中将元组提取到行中 [英] Extracting tuples into rows in a pandas DataFrame
问题描述
我有一个看起来像这样的DataFrame:
I have got a DataFrame that looks like this:
function_name argument A B
func1 (func1_arg1, func1_arg2) value_a1 b
func2 (func2_arg1,) value_a2 b
func3 (func3_arg1, func3_arg2, func3_arg3) value_a3 b
我希望它看起来像这样:
and I want it to look like so:
function_name argument A B
func1 func1_arg1 value_a1 b
func1 func1_arg2 value_a1 b
func2 func2_arg1 value_a2 b
func3 func3_arg1 value_a3 b
func3 func3_arg2 value_a3 b
func3 func3_arg3 value_a3 b
实现这一目标的明确方法是什么? 在交互式python模式下,我尝试执行以下操作:
What would be a clear way to achieve it? In the interactive python mode, I tried doing the following:
import pandas as pd
D = {'function_name': ['func1', 'func2', 'func3'],
'argument': [('func1_arg1', 'func1_arg2'),
('func2_arg1',),
('func3_arg1', 'func3_arg2', 'func3_arg3')],
'A': ['value_a1', 'value_a2', 'value_a3'],
'B': 'b'}
data_frame = pd.DataFrame(D)
multiplicity = data_frame.argument.apply(len)
new_index = data_frame.function_name.repeat(multiplicity).index
new_data_frame = data_frame.reindex(new_index)
然后我发现,为了获得允许我使用元组的索引,我必须通过在new_data_frame上调用reset_index(drop=True)
来重置索引.换句话说,所有这些看起来都非常丑陋和愚蠢.有什么干净简洁的方法可以解决这个问题?
Then I found out that in order to get the indexing that would allow me to work with tuples I have to reset the index by invoking the reset_index(drop=True)
on the new_data_frame. In other words all this looks quite ugly and silly. Is there any clean and concise way to solve this problem?
推荐答案
如果您有数据框data_frame
,然后将索引设置为function_name
后应用pd.Series,则堆栈并重置索引将为您提供结果输出
If you have a dataframe data_frame
then applying pd.Series after setting the index as function_name
, stacking and resetting the index will give you the resultant output
D = {'function_name': ['func1', 'func2', 'func3'],
'argument': [('func1_arg1', 'func1_arg2'),
('func2_arg1',),
('func3_arg1', 'func3_arg2', 'func3_arg3')],
'A': ['value_a1', 'value_a2', 'value_a3'],
'B': 'b'}
data_frame = pd.DataFrame(D)
new_frame = data_frame.set_index(['function_name','A','B'])['argument'].apply(pd.Series).stack().to_frame('argument').reset_index().drop('level_3',1)
输出:
function_name A B argument
0 func1 value_a1 b func1_arg1
1 func1 value_a1 b func1_arg2
2 func2 value_a2 b func2_arg1
3 func3 value_a3 b func3_arg1
4 func3 value_a3 b func3_arg2
5 func3 value_a3 b func3_arg3
这篇关于在Pandas DataFrame中将元组提取到行中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!