根据一列制作成对的单词 [英] Making pairs of words based on one column

查看:57
本文介绍了根据一列制作成对的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据第三列(标识符)来成对单词.我的文件类似于以下示例:

I want to make pairs of words based on the third column (identifier). My file is similar to this example:

A ID.1
B ID.2
C ID.1
D ID.1
E ID.2
F ID.3  

我想要的结果是:

A C ID.1
A D ID.1
B E ID.2
C D ID.1

请注意,我不想以相反的顺序获得相同的单词对.在我的真实文件中,多次出现带有不同标识符的单词.

Note that I don't want to obtain the same word pair in the opposite order. In my real file some words appear more than one time with different identifiers.

我尝试了这段代码,效果很好,但是需要很多时间(而且我不知道是否存在冗余):

I tried this code which works well but requires a lot of time (and I don't know if there are redundancies):

counter=2
cat filtered_go_annotation.txt | while read f1 f2; do 
tail -n +$counter go_annotation.txt | grep $f2 | awk '{print "'$f1' " $1}'; 
((counter++))
done > go_network2.txt

'tail'用于在读取行时将其删除.

The 'tail' is used to delete a line when it's read.

推荐答案

分两步

$ sort -k2 file > file.s
$ join -j2 file.s{,} | awk '!(a[$2,$3]++ + a[$3,$2]++){print $2,$3,$1}'

A C ID.1
A D ID.1
C D ID.1
B E ID.2

这篇关于根据一列制作成对的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆