pandas :将数据框子集上的函数结果与原始数据框相结合 [英] Pandas: combining results from function on subset of dataframe with the original dataframe

查看:62
本文介绍了 pandas :将数据框子集上的函数结果与原始数据框相结合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是熊猫的新手,所以请原谅我. 尽管如此,我还是在这里做了很多问题.

I am new to Pandas so please forgive me inexperience. Nonetheless I have worked on a lot of the parts of my question here.

为简单起见,让我们以分位数归一化上的Wiki文章为例:

For simplicity let's take the example from the wiki article on Quantile Normalization:

A    5    4    3
B    2    1    4
C    3    4    6
D    4    2    8

并对其进行更新以适合我正在处理的数据结构:

and update it to fit the data structure that I am dealing with:

df = pd.DataFrame({
        'gene': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e', 'f', 'f', 'f'],
        'rep': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
        'val': [5, 4, 3, 2, 1, 4, 3, 4, 6, 4, 2, 8, 0, 1, 0, 0, 2, 4],
        'subset':['y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'n', 'n', 'n', 'n', 'n', 'n'] 
})


    gene rep subset val
0   a   1   y   5
1   a   2   y   4
2   a   3   y   3
3   b   1   y   2
4   b   2   y   1
5   b   3   y   4
6   c   1   y   3
7   c   2   y   4
8   c   3   y   6
9   d   1   y   4
10  d   2   y   2
11  d   3   y   8
12  e   1   n   0
13  e   2   n   1
14  e   3   n   0
15  f   1   n   0
16  f   2   n   2
17  f   3   n   4

这种扁平化的结构可能看起来很奇怪且效率低下(至少是多余的),但是对于我的特定用例-这是最好的选择-因此请裸露它.

This flattened structure might seem odd and inefficient (at the very least redundant), but for my particular use case - it is the best option - so please bare with it.

在此示例中,我们要对原始数据(基因a-d)进行分位数归一化,因此我们要抓取子集(在某些元数据键上获取子集):

In this example we want to run quantile normalization on the original data (genes a - d), so we grab the subset (take a subset on some meta datakey):

sub = df[df.subset == 'y']

形状仍处于关闭状态,因此使用了pivot函数,这是我最近从我的

The shape is still off so using the pivot function as I recently learned from @Wan from my GroupBy question:

piv = sub.pivot(index='gene', columns='rep', values='val')

rep 1   2   3
gene            
a   5   4   3
b   2   1   4
c   3   4   6
d   4   2   8

这会导致其他列丢失,这些列可能与以后无关. 继续使用我的可处理混合数据帧的分位数归一化功能:

This results in the lost of the other columns which may or may not be relevant for later. Carrying on, using my quantile normalization function that can handle mixed dataframes:

quantile_normalize(piv, [1, 2, 3])

rep     1   2   3
gene            
a   5.666667    4.666667    2.000000
b   2.000000    2.000000    3.000000
c   3.000000    4.666667    4.666667
d   4.666667    3.000000    5.666667

这是Wiki的预期结果:

which is the expected result from the wiki:

A    5.67    4.67    2.00
B    2.00    2.00    3.00
C    3.00    4.67    4.67
D    4.67    3.00    5.67

整齐.

现在我的问题:

如何获取这些值并将其重新插入到原始数据框中?

How do I take these values and plug them back into the original data frame?

推荐答案

您可以 merge ,您的结果将返回到

You can merge your result back to the original dataframe after melt-ing your result dataframe:

result = quantile_normalize(piv, [1, 2, 3])
result = result.reset_index().melt(id_vars='gene', value_name='quantile')
result
>>>   gene rep  quantile
0     a   1         5.666667
1     b   1         2.000000
2     c   1         3.000000
3     d   1         4.666667
4     a   2         4.666667
5     b   2         2.000000
6     c   2         4.666667
7     d   2         3.000000
8     a   3         2.000000
9     b   3         3.000000
10    c   3         4.666667
11    d   3         5.666667

df = pd.merge(df, result, on=['gene', 'rep'], how='outer')
df 
>>>    gene  rep subset  val  quantile
0     a    1      y    5  5.666667
1     a    2      y    4  4.666667
2     a    3      y    3  2.000000
3     b    1      y    2  2.000000
4     b    2      y    1  2.000000
5     b    3      y    4  3.000000
6     c    1      y    3  3.000000
7     c    2      y    4  4.666667
8     c    3      y    6  4.666667
9     d    1      y    4  4.666667
10    d    2      y    2  3.000000
11    d    3      y    8  5.666667
12    e    1      n    0       NaN
13    e    2      n    1       NaN
14    e    3      n    0       NaN
15    f    1      n    0       NaN
16    f    2      n    2       NaN
17    f    3      n    4       NaN

这篇关于 pandas :将数据框子集上的函数结果与原始数据框相结合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆