使用 pandas 的从宽到长的数据集 [英] Wide to long dataset using pandas

查看:69
本文介绍了使用 pandas 的从宽到长的数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有很多标题相似的问题,但是我无法解决我的数据集所遇到的问题.

There are a lot of questions out there with similar titles but I'm unable to solve the issues that I'm having with my dataset.

数据集:

ID   Country Type Region Gender IA01_Raw  IA01_Class1  IA01_Class2 IA02_Raw IA02_Class1 IA02_Class2 QA_Include QA_Comments

SC1  France  A    Europe Male   4         8            1            J         4            1           yes       N/A
SC2  France  A    Europe Female 2         7            2            Q         6            4           yes       N/A
SC3  France  B    Europe Male   3         7            2            K         8            2           yes       N/A
SC4  France  A    Europe Male   4         8            2            A         2            1           yes       N/A
SC5  France  B    Europe Male   1         7            1            F         1            3           yes       N/A
ID6  France  A    Europe Male   2         8            1            R         3            7           yes       N/A
ID7  France  B    Europe Male   2         8            1            Q         4            6           yes       N/A
UC8  France  B    Europe Male   4         8            2            P         4            2           yes       N/A

必需的输出:

ID   Country Type Region Gender IA Raw Class1 Class2 QA_Include QA_Comments

SC1  France  A    Europe Male   01 K   8      1      yes        N/A
SC1  France  A    Europe Male   01 L   8      1      yes       N/A
SC1  France  A    Europe Male   01 P   8      1      yes       N/A
SC1  France  A    Europe Male   02 Q   8      1      yes       N/A
SC1  France  A    Europe Male   02 R   8      1      yes       N/A
SC1  France  A    Europe Male   02 T   8      1      yes       N/A
SC1  France  A    Europe Male   03 G   8      1      yes       N/A
SC1  France  A    Europe Male   03 R   8      1      yes       N/A
SC1  France  A    Europe Male   03 G   8      1      yes       N/A
SC1  France  A    Europe Male   04 K   8      1      yes       N/A
SC1  France  A    Europe Male   04 A   8      1      yes       N/A
SC1  France  A    Europe Male   04 P   8      1      yes       N/A
SC1  France  A    Europe Male   05 R   8      1      yes       N/A
....


在数据集"中,我有名为 IA [X] _NAME 的列,其中 X = 1..9 NAME = Raw,Class1 Class2 .


In the Dataset I've columns which are names as IA[X]_NAME where X = 1..9 and NAME = Raw, Class1 and Class2.

我要做的是只是转置这些列,以使它看起来像必需"输出中所示的表,即 IA 将显示 X 值,就像原始这样,它们将显示其透视值.

What I am trying to do is to just transpose these columns so that it looks like the table shown in Required output i.e. IA will show X value and just like this raw and classes will show their perspective values.

因此,为了实现它,我将列切片为:

So in order to achieve it I sliced the columns as:

idVars = list(excel_df_final.columns[0:40]) + list(excel_df_final.columns[472:527]) #These contain columns like ID, Country, Type etc
valueVars = excel_df_final.columns[41:472].tolist() #All the IA_ columns

我不知道此步骤是否必要,但这为我提供了完美的列切片,但是当我将其放入melt时,它无法正常工作.我已经尝试了其他问题中几乎所有可用的方法.

I don't know if this step was necessary but this gave me the perfect slices of columns but when I put it in melt it is not working properly. I have tried almost every method that is available in other questions.

pd.melt(excel_df_final, id_vars=idVars,value_vars=valueVars)

我也尝试过:

excel_df_final.set_index(idVars)[41:472].unstack()

但是没有用,这是长期的广泛实施,但也没有用:

but didn't work and here is Wide to long implementation which also didn't work:

pd.wide_to_long(excel_df_final, stubnames = ['IA', 'Raw', 'Class1', 'Class2'], i=idVars, j=valueVars)

我得到的错误很长一段时间是:

The error I got for wide to long is:

ValueError:操作数不能与形状一起广播(95,) (431,)

ValueError: operands could not be broadcast together with shapes (95,) (431,)

由于我的数据集实际有526列,因此我将它们分为两个列表,其中一个包含95个列名,这将是i,其余431个是我需要在列表中显示的列行,如示例数据集中所示.

As my dataset has 526 columns in real, so that is why I've divided them into two lists one contains 95 column names which will be the i and the rest 431 are the one that I need to show in the row as shown in the sample data set.

推荐答案

这将使您入门.本质是使用set_index,将列转换为MultiIndex,然后转换为stack.可能存在更好的解决方案,但是我会这样做,因为这是您输出的一个简单步骤.

This will get you started. The essence is using set_index, column conversion to MultiIndex, then stack. Better solutions exist, possibly, but I would do it this way because it is an easy step to your output.

# Set the index with columns that we don't want to "transpose"
df2 = df.set_index([
   'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA values
df2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the index
out =  df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)

out

     ID Country Type  Region  Gender QA_Include  QA_Comments    IA  Class1  Class2 Raw
0   SC1  France    A  Europe    Male        yes          NaN  IA01       8       1   4
1   SC1  France    A  Europe    Male        yes          NaN  IA02       4       1   J
2   SC2  France    A  Europe  Female        yes          NaN  IA01       7       2   2
3   SC2  France    A  Europe  Female        yes          NaN  IA02       6       4   Q
4   SC3  France    B  Europe    Male        yes          NaN  IA01       7       2   3
5   SC3  France    B  Europe    Male        yes          NaN  IA02       8       2   K
6   SC4  France    A  Europe    Male        yes          NaN  IA01       8       2   4
7   SC4  France    A  Europe    Male        yes          NaN  IA02       2       1   A
8   SC5  France    B  Europe    Male        yes          NaN  IA01       7       1   1
9   SC5  France    B  Europe    Male        yes          NaN  IA02       1       3   F
10  ID6  France    A  Europe    Male        yes          NaN  IA01       8       1   2
11  ID6  France    A  Europe    Male        yes          NaN  IA02       3       7   R
12  ID7  France    B  Europe    Male        yes          NaN  IA01       8       1   2
13  ID7  France    B  Europe    Male        yes          NaN  IA02       4       6   Q
14  UC8  France    B  Europe    Male        yes          NaN  IA01       8       2   4
15  UC8  France    B  Europe    Male        yes          NaN  IA02       4       2   P

这篇关于使用 pandas 的从宽到长的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆