如何从同一列的数据中读取两行以创建该列的值的组合? [英] How to read two lines in a data from same column to create combination of values from that column?
问题描述
在以下数据中:
M1 M2 M3 M4 M5 M6 M7 M8 Hx Hy S1 S2 S3 S4
A T T A A G A C A C C G C T
A T T A A G A C A C C G C T
T G C T G T T G T A A T A T
C A A C A G T C C G G A C G
G T G T A T C T G T C T T T
使用以下代码:
d1 = d1.add('g').add(d1.shift()).dropna()
获取:
M1 M2 M3 M4 M5 M6 M7 M8 H0 H1 S1 S2 S3 S4
AgA TgT TgT AgA AgA GgG AgA CgC AgA CgC CgC GgG CgC TgT
TgA GgT CgT TgA GgA TgG TgA GgC TgA AgC AgC TgG AgC TgT
CgT AgG AgC CgT AgG GgT TgT CgG CgT GgA GgA AgT CgA GgT
GgC TgA GgA TgC AgA TgG CgT TgC GgC TgG CgG TgA TgC TgG
但是,如果数据具有以下结构:
But, if the data is of following structure:
M1 M2 M3 M4 Hx Hy S1 S2 pos
A/T T/A A/G G/G A C C/G C/T 2
A/T T/A A/G G/G G T C/G C/T 12
T/G C/T G/T T/G C G T/T T/T 16
T/T T/T T/T T|T G T T/T T/T 17
我想要组合所有可能的字母(前一行和当前行之间)除 pos
之外的每一列。
I instead want the combination of all possible letter (between previous and current line) for each column except for pos
.
所以,这将是:
M1 M2 Hx Hy S1 S2
AgA,AgT,TgA,TgT TgT,TgA,AgT,AgA AgA TgC CgC,CgG,GgC,GgG CgC,CgT,TgC,TgT
TgA,TgT,GgA,GgT ....
so on for all other line
我正在添加一个矩阵来理解这个过程:
I am adding a matrix to understand the process:
values from previous line in m1 (at pos 12)
A T
value from next T TgA TgT
next line pos 16 -> G GgA GgT
我试图使用itertools将每行中的值作为字典列表:
for row in d1_group.iterrows():
index, data = row
temp.append(data.tolist())
print(temp)
以为是使用索引(或pos)作为键,然后创建相邻索引(或pos)值之间的组合。
next, thought is to use index (or pos) as keys and then create combinations between adjacent index (or pos) values.
任何使用熊猫或字典的可能性。
Any possibility doing this using pandas or dictionary.
谢谢,
推荐答案
序言:
import itertools as it
list(it.product(['A'], ['T']))
Out[229]: [('A', 'T')]
list(it.product(['A', 'T'], ['T', 'G']))
Out[230]: [('A', 'T'), ('A', 'G'), ('T', 'T'), ('T', 'G')]
','.join('g'.join(t) for t in it.product(['A'], ['T']))
Out[231]: 'AgT'
','.join('g'.join(t) for t in it.product(['T', 'G'],['A', 'T']))
Out[233]: 'TgA,TgT,GgA,GgT'
所以我们来构建一个包含这个的数据框:
So let's build a dataframe that contains this:
df=df.applymap(lambda c: [[c]])
df
Out[258]:
M1 M2 M3 M4 M5 M6 M7 M8 Hx Hy \
0 [[A]] [[T]] [[T]] [[A]] [[A]] [[G]] [[A]] [[C]] [[A]] [[C]]
1 [[A]] [[T]] [[T]] [[A]] [[A]] [[G]] [[A]] [[C]] [[A]] [[C]]
2 [[T]] [[G]] [[C]] [[T]] [[G]] [[T]] [[T]] [[G]] [[T]] [[A]]
3 [[C]] [[A]] [[A]] [[C]] [[A]] [[G]] [[T]] [[C]] [[C]] [[G]]
4 [[G]] [[T]] [[G]] [[T]] [[A]] [[T]] [[C]] [[T]] [[G]] [[T]]
(df+df.shift(1)).dropna(how='all').applymap(lambda c: ','.join('g'.join(t)
for t in it.product(*c)))
Out[266]:
M1 M2 M3 M4 M5 M6 M7 M8 Hx Hy S1 S2 S3 S4
1 AgA TgT TgT AgA AgA GgG AgA CgC AgA CgC CgC GgG CgC TgT
2 TgA GgT CgT TgA GgA TgG TgA GgC TgA AgC AgC TgG AgC TgT
3 CgT AgG AgC CgT AgG GgT TgT CgG CgT GgA GgA AgT CgA GgT
4 GgC TgA GgA TgC AgA TgG CgT TgC GgC TgG CgG TgA TgC TgG
现在对于th e夫妇只需更多的清理/准备:
Now the same for the couples with just a bit more of cleanup/preparation:
df.set_index('pos', inplace=True)
df
Out[273]:
M1 M2 M3 M4 Hx Hy S1 S2
pos
2 A/T T/A A/G G/G A C C/G C/T
12 A/T T/A A/G G/G G T C/G C/T
16 T/G C/T G/T T/G C G T/T T/T
17 T/T T/T T/T T|T G T T/T T/T
df = df.applymap(lambda c: [c.split('/')])
df
Out[274]:
M1 M2 M3 M4 Hx Hy S1 S2
pos
2 [[A, T]] [[T, A]] [[A, G]] [[G, G]] [[A]] [[C]] [[C, G]] [[C, T]]
12 [[A, T]] [[T, A]] [[A, G]] [[G, G]] [[G]] [[T]] [[C, G]] [[C, T]]
16 [[T, G]] [[C, T]] [[G, T]] [[T, G]] [[C]] [[G]] [[T, T]] [[T, T]]
17 [[T, T]] [[T, T]] [[T, T]] [[T|T]] [[G]] [[T]] [[T, T]] [[T, T]]
(df+df.shift(1)).dropna(how='all').applymap(lambda c: ','.join('g'.join(t) for t in it.product(*c)))
Out[276]:
M1 M2 M3 M4 Hx \
pos
12 AgA,AgT,TgA,TgT TgT,TgA,AgT,AgA AgA,AgG,GgA,GgG GgG,GgG,GgG,GgG GgA
16 TgA,TgT,GgA,GgT CgT,CgA,TgT,TgA GgA,GgG,TgA,TgG TgG,TgG,GgG,GgG CgG
17 TgT,TgG,TgT,TgG TgC,TgT,TgC,TgT TgG,TgT,TgG,TgT T|TgT,T|TgG GgC
Hy S1 S2
pos
12 TgC CgC,CgG,GgC,GgG CgC,CgT,TgC,TgT
16 GgT TgC,TgG,TgC,TgG TgC,TgT,TgC,TgT
17 TgG TgT,TgT,TgT,TgT TgT,TgT,TgT,TgT
你现在可以重新设置索引,并返回 pos
。您可能需要通过改变方向进行调整并适当调整。
You can now reset the index and get pos
back. You might need adjustement by shifting it and align it appropriately.
这篇关于如何从同一列的数据中读取两行以创建该列的值的组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!