使用每个文件中第一个序列的ID自动重命名fasta文件 [英] Automatically rename fasta files with the ID of the first sequence in each file
问题描述
我在同一目录中有多个具有单个序列的fasta文件.我想使用在fasta文件中存在的单个序列的标题来重命名每个fasta文件.当我运行我的代码时,我得到替换模式未终止于(用户提供的代码)"
I have multiple fasta files with single sequence in the same directory. I want to rename each fasta file with the header of the single sequence present in the fasta file. When i run my code , i obtain "Substitution pattern not terminated at (user-supplied code)"
我的代码:
#!/bin/bash
for i in /home/maryem/files/;
do
if [ ! -f $i ]; then
echo "skipping $i";
else
newname=`head -1 $i | sed 's/^\s*\([a-zA-Z0-9]\+\).*$/\1/'`;
[ -n "$newname" ] ;
mv -i $i $newname.fasta || echo "error at: $i";
fi;
done | rename s/ // *.fasta
fasta文件:
>NC_013361.1 Escherichia coli O26:H11 str. 11368 DNA, complete genome
AGCTTTTCATTCTGACTGCAATGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGCTTCTGAACTG
GTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGAC
AGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTATCACCACCATCACCATTACCACAGGT
我不确定是否还有另一种方法可以用标头中的ID重命名每个文件??
I'm not sure if there is another way to rename each file with the ID in the header ??
推荐答案
鉴于ID是文件的第一个单词",您可以在包含fasta文件的目录中运行以下内容.
Given that the ID is the first "word" of the file, you can run the following in the directory containing the fasta files.
for f in *.fasta; do d="$(head -1 "$f" | awk '{print $1}').fasta"; if [ ! -f "$d" ]; then mv "$f" "$d"; else echo "File '$d' already exists! Skiped '$f'"; fi; done
信用: https://unix.stackexchange.com/a/13161
这篇关于使用每个文件中第一个序列的ID自动重命名fasta文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!