如何从同一列的数据中读取两行以创建该列的值的组合? [英] How to read two lines in a data from same column to create combination of values from that column?

查看:177
本文介绍了如何从同一列的数据中读取两行以创建该列的值的组合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以下数据中:

M1  M2  M3  M4  M5  M6  M7  M8  Hx Hy    S1    S2    S3    S4
A   T   T   A   A   G   A   C   A   C    C     G     C     T
A   T   T   A   A   G   A   C   A   C    C     G     C     T
T   G   C   T   G   T   T   G   T   A    A     T     A     T
C   A   A   C   A   G   T   C   C   G    G     A     C     G
G   T   G   T   A   T   C   T   G   T    C     T     T     T

使用以下代码:

d1 = d1.add('g').add(d1.shift()).dropna()

获取:

M1   M2   M3   M4   M5   M6   M7   M8   H0   H1   S1   S2   S3   S4                                                                         
AgA  TgT  TgT  AgA  AgA  GgG  AgA  CgC  AgA  CgC  CgC  GgG  CgC  TgT   
TgA  GgT  CgT  TgA  GgA  TgG  TgA  GgC  TgA  AgC  AgC  TgG  AgC  TgT   
CgT  AgG  AgC  CgT  AgG  GgT  TgT  CgG  CgT  GgA  GgA  AgT  CgA  GgT   
GgC  TgA  GgA  TgC  AgA  TgG  CgT  TgC  GgC  TgG  CgG  TgA  TgC  TgG 

但是,如果数据具有以下结构:

But, if the data is of following structure:

M1   M2   M3  M4     Hx  Hy   S1  S2        pos  
A/T  T/A  A/G  G/G    A    C    C/G  C/T    2
A/T  T/A  A/G  G/G    G    T    C/G  C/T    12
T/G  C/T  G/T  T/G    C    G    T/T  T/T    16
T/T  T/T  T/T  T|T    G    T    T/T  T/T    17

我想要组合所有可能的字母(前一行和当前行之间)除 pos 之外的每一列。

I instead want the combination of all possible letter (between previous and current line) for each column except for pos.

所以,这将是:

M1                M2               Hx    Hy      S1                S2                                               
AgA,AgT,TgA,TgT  TgT,TgA,AgT,AgA   AgA   TgC   CgC,CgG,GgC,GgG    CgC,CgT,TgC,TgT
TgA,TgT,GgA,GgT ....
so on for all other line

我正在添加一个矩阵来理解这个过程:

I am adding a matrix to understand the process:

values from previous line in m1 (at pos 12)
                                  A       T
value from next            T     TgA     TgT
next line  pos 16 ->       G     GgA     GgT

我试图使用itertools将每行中的值作为字典列表:

for row in d1_group.iterrows():
    index, data = row
    temp.append(data.tolist())
print(temp)

以为是使用索引(或pos)作为键,然后创建相邻索引(或pos)值之间的组合。

next, thought is to use index (or pos) as keys and then create combinations between adjacent index (or pos) values.

任何使用熊猫或字典的可能性。

Any possibility doing this using pandas or dictionary.

谢谢,

推荐答案

序言:

import itertools as it

list(it.product(['A'], ['T']))
Out[229]: [('A', 'T')]

list(it.product(['A', 'T'], ['T', 'G']))
Out[230]: [('A', 'T'), ('A', 'G'), ('T', 'T'), ('T', 'G')]

','.join('g'.join(t) for t in it.product(['A'], ['T']))
Out[231]: 'AgT'

','.join('g'.join(t) for t in it.product(['T', 'G'],['A', 'T']))
Out[233]: 'TgA,TgT,GgA,GgT'

所以我们来构建一个包含这个的数据框:

So let's build a dataframe that contains this:

df=df.applymap(lambda c: [[c]])

df
Out[258]: 
      M1     M2     M3     M4     M5     M6     M7     M8     Hx     Hy  \
0  [[A]]  [[T]]  [[T]]  [[A]]  [[A]]  [[G]]  [[A]]  [[C]]  [[A]]  [[C]]   
1  [[A]]  [[T]]  [[T]]  [[A]]  [[A]]  [[G]]  [[A]]  [[C]]  [[A]]  [[C]]   
2  [[T]]  [[G]]  [[C]]  [[T]]  [[G]]  [[T]]  [[T]]  [[G]]  [[T]]  [[A]]   
3  [[C]]  [[A]]  [[A]]  [[C]]  [[A]]  [[G]]  [[T]]  [[C]]  [[C]]  [[G]]   
4  [[G]]  [[T]]  [[G]]  [[T]]  [[A]]  [[T]]  [[C]]  [[T]]  [[G]]  [[T]]  

(df+df.shift(1)).dropna(how='all').applymap(lambda c: ','.join('g'.join(t)
                                                      for t in it.product(*c)))
Out[266]: 
    M1   M2   M3   M4   M5   M6   M7   M8   Hx   Hy   S1   S2   S3   S4
1  AgA  TgT  TgT  AgA  AgA  GgG  AgA  CgC  AgA  CgC  CgC  GgG  CgC  TgT
2  TgA  GgT  CgT  TgA  GgA  TgG  TgA  GgC  TgA  AgC  AgC  TgG  AgC  TgT
3  CgT  AgG  AgC  CgT  AgG  GgT  TgT  CgG  CgT  GgA  GgA  AgT  CgA  GgT
4  GgC  TgA  GgA  TgC  AgA  TgG  CgT  TgC  GgC  TgG  CgG  TgA  TgC  TgG

现在对于th e夫妇只需更多的清理/准备:

Now the same for the couples with just a bit more of cleanup/preparation:

df.set_index('pos', inplace=True)

df
Out[273]: 
      M1   M2   M3   M4 Hx Hy   S1   S2
pos                                    
2    A/T  T/A  A/G  G/G  A  C  C/G  C/T
12   A/T  T/A  A/G  G/G  G  T  C/G  C/T
16   T/G  C/T  G/T  T/G  C  G  T/T  T/T
17   T/T  T/T  T/T  T|T  G  T  T/T  T/T

df = df.applymap(lambda c: [c.split('/')])
df
Out[274]: 
           M1        M2        M3        M4     Hx     Hy        S1        S2
pos                                                                          
2    [[A, T]]  [[T, A]]  [[A, G]]  [[G, G]]  [[A]]  [[C]]  [[C, G]]  [[C, T]]
12   [[A, T]]  [[T, A]]  [[A, G]]  [[G, G]]  [[G]]  [[T]]  [[C, G]]  [[C, T]]
16   [[T, G]]  [[C, T]]  [[G, T]]  [[T, G]]  [[C]]  [[G]]  [[T, T]]  [[T, T]]
17   [[T, T]]  [[T, T]]  [[T, T]]   [[T|T]]  [[G]]  [[T]]  [[T, T]]  [[T, T]]



(df+df.shift(1)).dropna(how='all').applymap(lambda c: ','.join('g'.join(t) for t in it.product(*c)))
Out[276]: 
                  M1               M2               M3               M4   Hx  \
pos                                                                            
12   AgA,AgT,TgA,TgT  TgT,TgA,AgT,AgA  AgA,AgG,GgA,GgG  GgG,GgG,GgG,GgG  GgA   
16   TgA,TgT,GgA,GgT  CgT,CgA,TgT,TgA  GgA,GgG,TgA,TgG  TgG,TgG,GgG,GgG  CgG   
17   TgT,TgG,TgT,TgG  TgC,TgT,TgC,TgT  TgG,TgT,TgG,TgT      T|TgT,T|TgG  GgC   

      Hy               S1               S2  
pos                                         
12   TgC  CgC,CgG,GgC,GgG  CgC,CgT,TgC,TgT  
16   GgT  TgC,TgG,TgC,TgG  TgC,TgT,TgC,TgT  
17   TgG  TgT,TgT,TgT,TgT  TgT,TgT,TgT,TgT  

你现在可以重新设置索引,并返回 pos 。您可能需要通过改变方向进行调整并适当调整。

You can now reset the index and get pos back. You might need adjustement by shifting it and align it appropriately.

这篇关于如何从同一列的数据中读取两行以创建该列的值的组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆