在Pandas DataFrame中将元组提取到行中 [英] Extracting tuples into rows in a pandas DataFrame

查看:57
本文介绍了在Pandas DataFrame中将元组提取到行中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的DataFrame:

I have got a DataFrame that looks like this:

function_name    argument                              A        B

func1            (func1_arg1, func1_arg2)             value_a1  b
func2            (func2_arg1,)                        value_a2  b
func3            (func3_arg1, func3_arg2, func3_arg3) value_a3  b

我希望它看起来像这样:

and I want it to look like so:

function_name   argument    A          B

func1           func1_arg1  value_a1   b
func1           func1_arg2  value_a1   b
func2           func2_arg1  value_a2   b
func3           func3_arg1  value_a3   b
func3           func3_arg2  value_a3   b
func3           func3_arg3  value_a3   b

实现这一目标的明确方法是什么? 在交互式python模式下,我尝试执行以下操作:

What would be a clear way to achieve it? In the interactive python mode, I tried doing the following:

import pandas as pd


D = {'function_name': ['func1', 'func2', 'func3'],
     'argument': [('func1_arg1', 'func1_arg2'), 
                  ('func2_arg1',), 
                  ('func3_arg1', 'func3_arg2', 'func3_arg3')],
     'A': ['value_a1', 'value_a2', 'value_a3'],
     'B': 'b'}
data_frame = pd.DataFrame(D)
multiplicity = data_frame.argument.apply(len)
new_index = data_frame.function_name.repeat(multiplicity).index
new_data_frame = data_frame.reindex(new_index)

然后我发现,为了获得允许我使用元组的索引,我必须通过在new_data_frame上调用reset_index(drop=True)来重置索引.换句话说,所有这些看起来都非常丑陋和愚蠢.有什么干净简洁的方法可以解决这个问题?

Then I found out that in order to get the indexing that would allow me to work with tuples I have to reset the index by invoking the reset_index(drop=True) on the new_data_frame. In other words all this looks quite ugly and silly. Is there any clean and concise way to solve this problem?

推荐答案

如果您有数据框data_frame,然后将索引设置为function_name后应用pd.Series,则堆栈并重置索引将为您提供结果输出

If you have a dataframe data_frame then applying pd.Series after setting the index as function_name, stacking and resetting the index will give you the resultant output

D = {'function_name': ['func1', 'func2', 'func3'],
 'argument': [('func1_arg1', 'func1_arg2'), 
              ('func2_arg1',), 
              ('func3_arg1', 'func3_arg2', 'func3_arg3')],
 'A': ['value_a1', 'value_a2', 'value_a3'],
 'B': 'b'}
data_frame = pd.DataFrame(D)

new_frame = data_frame.set_index(['function_name','A','B'])['argument'].apply(pd.Series).stack().to_frame('argument').reset_index().drop('level_3',1)

输出:


 function_name         A  B    argument
0         func1  value_a1  b  func1_arg1
1         func1  value_a1  b  func1_arg2
2         func2  value_a2  b  func2_arg1
3         func3  value_a3  b  func3_arg1
4         func3  value_a3  b  func3_arg2
5         func3  value_a3  b  func3_arg3

这篇关于在Pandas DataFrame中将元组提取到行中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆