大 pandas :将具有重复行名的数据重塑为列 [英] Pandas: reshape data with duplicate row names to columns

查看:49
本文介绍了大 pandas :将具有重复行名的数据重塑为列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似这样的数据集(显示了第一行):

I have a data set that's sort of like this (first lines shown):

Sample  Detector        Cq
P_1   106    23.53152
P_1   106    23.152458
P_1   106    23.685083
P_1   135        24.465698
P_1   135        23.86892
P_1   135        23.723469
P_1   17  22.524242
P_1   17  20.658733
P_1   17  21.146122

样本"和检测器"列均包含重复的值("Cq"是唯一的):确切地说,每个检测器"对于每个样本都会出现3次,因为它是数据中的重复项.

Both "Sample" and "Detector" columns contain duplicated values ("Cq" is unique): to be precise, each "Detector" appears 3 times for each sample, because it's a replicate in the data.

我需要做的是

  • 重塑表格,使列包含样本"和行检测器"
  • 重命名重复的列,以便我知道它是哪个重复

我认为DataFrame.pivot可以解决问题,但是由于重复数据而失败.最好的方法是什么?重命名重复项,然后重塑形状,还是有更好的选择?

I thought that DataFrame.pivot would do the trick, but it fails because of the duplicate data. What would be the best approach? Rename the duplicates, then reshape, or is there a better option?

我考虑了一下,我认为最好陈述一下目的.我需要为每个样本"存储其检测器"的均值和标准差.

I thought over it and I think it's better to state the purpose. I need to store for each "Sample" the mean and standard deviation of their "Detector".

推荐答案

您似乎正在寻找的是分层索引的数据框

It looks like what you may be looking for is a hierarchical indexed dataframe [link].

这样的作品行吗?

#build a sample dataframe
a=['P_1']*9
b=[106,106,106,135,135,135,17,17,17]
c = np.random.randint(1,100,9)
df = pandas.DataFrame(data=zip(a,b,c), columns=['sample','detector','cq'])

#add a repetition number column
df['rep_num']=[1,2,3]*( len(df)/3 )

#Convert to a multi-indexed DF
df_multi = df.set_index(['sample','detector','rep_num'])

#--------------Resulting Dataframe---------------------

                             cq
sample detector rep_num    
P_1    106      1        97
                2        83
                3        81
       135      1        46
                2        92
                3        89
       17       1        58
                2        26
                3        75

这篇关于大 pandas :将具有重复行名的数据重塑为列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆