从文件中读取行,在第二个文件中grep,并为每个$ line输出一个文件 [英] Read lines from a file, grep in a second file, and output a file for each $line
问题描述
我有以下两个文件:
sequences.txt
158333741 Acaryochloris_marina_MBIC11017_uid58167 158333741 432 1 432 COG0001 0
158339504 Acaryochloris_marina_MBIC11017_uid58167 158339504 491 1 491 COG0002 0
379012832 Acetobacterium_woodii_DSM_1030_uid88073 379012832 430 1 430 COG0001 0
302391336 Acetohalobium_arabaticum_DSM_5501_uid51423 302391336 441 1 441 COG0003 0
311103820 Achromobacter_xylosoxidans_A8_uid59899 311103820 425 1 425 COG0004 0
332795879 Acidianus_hospitalis_W1_uid66875 332795879 369 1 369 COG0005 0
332796307 Acidianus_hospitalis_W1_uid66875 332796307 416 1 416 COG0005 0
allids.txt
COG0001
COG0002
COG0003
COG0004
COG0005
现在,我想读取 allids.txt
中的每一行,搜索 sequences.txt
中的所有行(特别是在第7列中),并为每个编写
文件名为 allids.txt
中的line $ line
的文件.
Now I want to read each line in allids.txt
, search all lines in sequences.txt
(specifically in column 7), and write for each line
in allids.txt
a file with the filename $line
.
我的方法是使用简单的grep:
my approach is to use a simple grep:
while read line; do
grep "$line" sequences.txt
done <allids.txt
但是我应该在哪里合并用于输出的命令?如果有更快的命令,请随时提出建议!
but where do I incorporate the command for the output? If there is a command that is faster, feel free to suggest!
我的预期输出:
COG0001.txt
158333741 Acaryochloris_marina_MBIC11017_uid58167 158333741 432 1 432 COG0001 0
379012832 Acetobacterium_woodii_DSM_1030_uid88073 379012832 430 1 430 COG0001 0
COG0002.txt
158339504 Acaryochloris_marina_MBIC11017_uid58167 158339504 491 1 491 COG0002 0
[以此类推]
推荐答案
我怀疑您真正需要的是:
I suspect all you really need is:
awk '{print > ($7".txt")}' sequences.txt
这种怀疑是基于您的ID文件被命名为 allIds.txt
(请注意 all ),并且 sequences.txt
在 allIds.txt
中不存在.
That suspicion is based on your IDs file being named allIds.txt
(note the all) and there being no IDs in sequences.txt
that don't exist in allIds.txt
.
这篇关于从文件中读取行,在第二个文件中grep,并为每个$ line输出一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!