Matlab:删除重复的交互 [英] Matlab: Removing duplicate interactions

查看:127
本文介绍了Matlab:删除重复的交互的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个蛋白质 - 蛋白质相互作用的数据。矩阵的大小为< 4850628x3>。前两列是蛋白质,第三列是其自信得分。如果蛋白A与B,C,D相互作用,问题是一半的行是重复的对



。它被提及为




  • AB 0.8

  • AC 0.5

  • AD 0.6

  • BA 0.8

  • CA 0.5

  • DA 0.6



如果您观察到与A和B交互的A进行交互的自信得分为0.8



如果我有一个<4850628x3>的矩阵,那么一半的行是重复对。如果我选择Unique(1,:)我可能会丢失一些数据。



但是我想要< 2425314x3>即没有重复的对。如何有效地执行?



感谢
Naresh

解决方案

假设在您的矩阵中,您可以使用唯一的ID存储每个蛋白质。

(例如:A = 1,B = 2,C = 3 ...)您的示例矩阵将是: >

  M = 

1.0000 2.0000 0.8000
1.0000 3.0000 0.5000
1.0000 4.0000 0.6000
2.0000 1.0000 0.8000
3.0000 1.0000 0.5000
4.0000 1.0000 0.6000

您必须首先排序行第二列,以便您始终按相同的顺序使用蛋白质对:

  M2 = sort(M(:,1:2),2)

M2 =

1 2
1 3
1 4
1 2
1 3
1 4

然后使用唯一与第二个参数 rows 并保留唯一对索引:

  [〜,idx] = unique(M2,'rows')

idx =

1
2
3

最后过滤您的初始矩阵,以保持独特的对。

  R = M(idx,:) 

R =

1.0000 2.0000 0.8000
1.0000 3.0000 0.5000
1.0000 4.0000 0.6000

Etvoilà! / p>

I have a Protein-Protein interaction data of homo sapiens. The size of the matrix is <4850628x3>. The first two columns are proteins and the third is its confident score. The problem is half the rows are duplicate pairs

if protein A interacts with B, C, D. it is mentioned as

  • A B 0.8
  • A C 0.5
  • A D 0.6
  • B A 0.8
  • C A 0.5
  • D A 0.6

If you observe the confident score of A interacting with B and B interacting with A is 0.8

If I have a matrix of <4850628x3> half the rows are duplicate pairs. If I choose Unique(1,:) I might loose some data.

But I want <2425314x3> i.e without duplicate pairs. How can I do it efficiently?

Thanks Naresh

解决方案

Supposing that in your matrix you store each protein with a unique id.
(Eg: A=1, B=2, C=3...) your example matrix will be:

M =

    1.0000    2.0000    0.8000
    1.0000    3.0000    0.5000
    1.0000    4.0000    0.6000
    2.0000    1.0000    0.8000
    3.0000    1.0000    0.5000
    4.0000    1.0000    0.6000

You must first sort the two first columns row-wise so you will always have the protein pairs in the same order:

M2 = sort(M(:,1:2),2)

M2 =

     1     2
     1     3
     1     4
     1     2
     1     3
     1     4

Then use unique with the second parameter rows and keep the indexes of unique pairs:

[~, idx] = unique(M2, 'rows')

idx =

     1
     2
     3

Finally filter your initial matrix to keep unly the unique pairs.

R = M(idx,:)

R =

    1.0000    2.0000    0.8000
    1.0000    3.0000    0.5000
    1.0000    4.0000    0.6000

Et voilà!

这篇关于Matlab:删除重复的交互的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆