根据一列制作成对的单词 [英] Making pairs of words based on one column
本文介绍了根据一列制作成对的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想根据第三列(标识符)来成对单词.我的文件类似于以下示例:
I want to make pairs of words based on the third column (identifier). My file is similar to this example:
A ID.1
B ID.2
C ID.1
D ID.1
E ID.2
F ID.3
我想要的结果是:
A C ID.1
A D ID.1
B E ID.2
C D ID.1
请注意,我不想以相反的顺序获得相同的单词对.在我的真实文件中,多次出现带有不同标识符的单词.
Note that I don't want to obtain the same word pair in the opposite order. In my real file some words appear more than one time with different identifiers.
我尝试了这段代码,效果很好,但是需要很多时间(而且我不知道是否存在冗余):
I tried this code which works well but requires a lot of time (and I don't know if there are redundancies):
counter=2
cat filtered_go_annotation.txt | while read f1 f2; do
tail -n +$counter go_annotation.txt | grep $f2 | awk '{print "'$f1' " $1}';
((counter++))
done > go_network2.txt
'tail'用于在读取行时将其删除.
The 'tail' is used to delete a line when it's read.
推荐答案
分两步
$ sort -k2 file > file.s
$ join -j2 file.s{,} | awk '!(a[$2,$3]++ + a[$3,$2]++){print $2,$3,$1}'
A C ID.1
A D ID.1
C D ID.1
B E ID.2
这篇关于根据一列制作成对的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文