Grep输出增加了额外的破折号和换行符 [英] Grep output adds extra dashes and newlines
问题描述
我在Mac上使用bash来运行一些grep,并且我正在使用与macports一起安装的GNU grep。无论如何,我试图用grep查询一个fasta文件(DNA序列 - 一行中的DNA序列与下一行上的序列ID),根据要查询的字符串文件列表输出文件的子集。目前我有一个列表,它是用换行符和fasta文件分隔的单个单词,并且使用命令
grep -A1 - f query_list.txt initial_file.fasta> query_subset.fasta
这几乎产生我在后面的输出,但在输出文件中,在每个序列之后设置与查询文件中的字符串匹配时,换行符上会出现双重短划线。不知道为什么会发生。我试着用sed去除它们
sed's / \\\
- \\\
/ \''query_subset .fasta> final.fasta
但这不起作用。如果我在textwrangler中使用相同的查找和替换,它可以正常工作。无论如何,作为例子,这些文件看起来像
query_list.txt
SpeciesA
SpeciesC
initial_file.fasta
> SpeciesA
ACGTGATCGATCGAT
个SpeciesB
ACGGGTCTTAGTATCG
个SpeciesC
ACGTACGATCTTCAGT
个SpeciesD
ACGTTCAGTCAGTTCAG
query_subset.fasta
> SpeciesA
ACGTGATCGATCGAT
-
个SpeciesC
ACGTACGATCTTCAGT
-
我需要通过命令行完成此操作我试图将它实现为一个脚本来自动化一些样本处理。
任何输入都将非常感谢!
干杯,
Tris
grep -A1 -f query_list.txt initial_file.fasta | sed'/ ^ - / d'> final.fasta
或
grep -A1 -f query_list.txt initial_file.fasta | grep -v'^ - '> final.fasta
I'm using bash on a mac to run some grep and I'm using GNU grep installed with macports. Anyway, I'm trying to query a fasta file (DNA sequences - sequence ID on one line with the DNA sequence on the following line) with grep to output a subset of the file based on a file list of strings to query. Currently I have a list which is single words separated on a newline and the fasta file and am using the command
grep -A1 -f query_list.txt initial_file.fasta > query_subset.fasta
This almost produces the output I'm after but in the output file, after each sequence set that matches a string in the query file there is a double dash on a newline. Not sure why it's happening . I've tried removing them with sed
sed 's/\n--\n/\n' query_subset.fasta > final.fasta
but that doesn't work. If I use that same find and replace in textwrangler it works fine.
Anyway, as an example the files look like
query_list.txt
SpeciesA
SpeciesC
initial_file.fasta
>SpeciesA
ACGTGATCGATCGAT
>SpeciesB
ACGGGTCTTAGTATCG
>SpeciesC
ACGTACGATCTTCAGT
>SpeciesD
ACGTTCAGTCAGTTCAG
query_subset.fasta
>SpeciesA
ACGTGATCGATCGAT
--
>SpeciesC
ACGTACGATCTTCAGT
--
I need this to be done via the command line as I'm trying to implement it into a script to automate some sample processing.
Any input is greatly appreciated! Cheers, Tris
grep -A1 -f query_list.txt initial_file.fasta | sed '/^--/d' > final.fasta
or
grep -A1 -f query_list.txt initial_file.fasta | grep -v '^--' > final.fasta
这篇关于Grep输出增加了额外的破折号和换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!