将 pandas 交叉表转换为堆叠的数据框(常规表) [英] Converting a pandas crosstab into a stacked dataframe (a regular table)

查看:202
本文介绍了将 pandas 交叉表转换为堆叠的数据框(常规表)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个熊猫交叉表,如何将其转换为堆叠的数据框?

Given a pandas crosstab, how do you convert that into a stacked dataframe?

假设您有一个堆叠的数据框.首先,我们将其转换为交叉表.现在,我想恢复为原始的堆叠数据框.我搜索了一个满足此要求的问题陈述,但找不到任何可以解决的问题.如果我错过了任何内容,请在评论部分中留下注释.

Assume you have a stacked dataframe. First we convert it into a crosstab. Now I would like to revert back to the original stacked dataframe. I searched a problem statement that addresses this requirement, but could not find any that hits bang on. In case I have missed any, please leave a note to it in the comment section.

我想在这里记录最佳实践.因此,谢谢您的支持.

I would like to document the best practice here. So, thank you for your support.

我知道 pandas.DataFrame. stack()是最好的方法.但是需要注意应用于级别"堆栈的情况.

I know that pandas.DataFrame.stack() would be the best approach. But one needs to be careful of the the "level" stacking is applied to.

输入:交叉表:


    Label   a   b   c   d   r
    ID                  
    1       0   1   0   0   0
    2       1   1   0   1   1
    3       1   0   0   0   1
    4       1   0   0   1   0
    6       1   0   0   0   0
    7       0   0   1   0   0
    8       1   0   1   0   0
    9       0   1   0   0   0

输出:堆叠的DataFrame :


        ID  Label
    0   1   b
    1   2   a
    2   2   b
    3   2   d
    4   2   r
    5   3   a
    6   3   r
    7   4   a
    8   4   d
    9   6   a
    10  7   c
    11  8   a
    12  8   c
    13  9   b

分步说明:

首先,让我们创建一个可以创建数据的函数.请注意,它随机生成堆叠的数据帧,因此,最终输出可能与我在下面给出的内容有所不同.

Step-by-step Explanation:

First, let's make a function that would create our data. Note that it randomly generates the stacked dataframe, and so, the final output may differ from what I have given below.

帮助器功能:制作堆叠和交叉表数据框

import numpy as np
import pandas as pd

# Make stacked dataframe
def _create_df():
    """
    This dataframe will be used to create a crosstab
    """
    B = np.array(list('abracadabra'))
    A = np.arange(len(B))
    AB = list()
    for i in range(20):
        a = np.random.randint(1,10)
        b = np.random.randint(1,10)
        AB += [(a,b)]
    AB = np.unique(np.array(AB), axis=0)
    AB = np.unique(np.array(list(zip(A[AB[:,0]], B[AB[:,1]]))), axis=0)
    AB_df = pd.DataFrame({'ID': AB[:,0], 'Label': AB[:,1]})
    return AB_df

original_stacked_df = _create_df()

# Make crosstab
crosstab_df = pd.crosstab(original_stacked_df['ID'], 
                          original_stacked_df['Label']).reindex()

会发生什么?

您希望函数能够从交叉表中重新生成堆叠的数据框.我将在答案部分中提供我自己的解决方案.如果您可以提出更好的建议,那就太好了.

What to expect?

You would expect a function to regenerate the stacked dataframe from the crosstab. I would provide my own solution to this in the answer section. If you could suggest something better that would be great.

推荐答案

您可以执行stack

df[df.astype(bool)].stack().reset_index().drop(0,1)

这篇关于将 pandas 交叉表转换为堆叠的数据框(常规表)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆