Grep输出增加了额外的破折号和换行符 [英] Grep output adds extra dashes and newlines

查看:322
本文介绍了Grep输出增加了额外的破折号和换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Mac上使用bash来运行一些grep,并且我正在使用与macports一起安装的GNU grep。无论如何,我试图用grep查询一个fasta文件(DNA序列 - 一行中的DNA序列与下一行上的序列ID),根据要查询的字符串文件列表输出文件的子集。目前我有一个列表,它是用换行符和fasta文件分隔的单个单词,并且使用命令

  grep -A1  - f query_list.txt initial_file.fasta> query_subset.fasta 

这几乎产生我在后面的输出,但在输出文件中,在每个序列之后设置与查询文件中的字符串匹配时,换行符上会出现双重短划线。不知道为什么会发生。我试着用sed去除它们

  sed's / \\\
- \\\
/ \''query_subset .fasta> final.fasta

但这不起作用。如果我在textwrangler中使用相同的查找和替换,它可以正常工作。无论如何,作为例子,这些文件看起来像

query_list.txt





  SpeciesA 
SpeciesC

initial_file.fasta

 > SpeciesA 
ACGTGATCGATCGAT
个SpeciesB
ACGGGTCTTAGTATCG
个SpeciesC
ACGTACGATCTTCAGT
个SpeciesD
ACGTTCAGTCAGTTCAG

query_subset.fasta

 > SpeciesA 
ACGTGATCGATCGAT
-
个SpeciesC
ACGTACGATCTTCAGT
-

我需要通过命令行完成此操作我试图将它实现为一个脚本来自动化一些样本处理。



任何输入都将非常感谢!
干杯,
Tris

解决方案

  grep -A1 -f query_list.txt initial_file.fasta | sed'/ ^  -  / d'> final.fasta 

  grep -A1 -f query_list.txt initial_file.fasta | grep -v'^  - '> final.fasta 


I'm using bash on a mac to run some grep and I'm using GNU grep installed with macports. Anyway, I'm trying to query a fasta file (DNA sequences - sequence ID on one line with the DNA sequence on the following line) with grep to output a subset of the file based on a file list of strings to query. Currently I have a list which is single words separated on a newline and the fasta file and am using the command

grep -A1 -f query_list.txt initial_file.fasta > query_subset.fasta

This almost produces the output I'm after but in the output file, after each sequence set that matches a string in the query file there is a double dash on a newline. Not sure why it's happening . I've tried removing them with sed

sed 's/\n--\n/\n' query_subset.fasta > final.fasta

but that doesn't work. If I use that same find and replace in textwrangler it works fine.

Anyway, as an example the files look like

query_list.txt

SpeciesA
SpeciesC

initial_file.fasta

>SpeciesA
ACGTGATCGATCGAT
>SpeciesB
ACGGGTCTTAGTATCG
>SpeciesC
ACGTACGATCTTCAGT
>SpeciesD
ACGTTCAGTCAGTTCAG

query_subset.fasta

>SpeciesA
ACGTGATCGATCGAT
--
>SpeciesC
ACGTACGATCTTCAGT
--

I need this to be done via the command line as I'm trying to implement it into a script to automate some sample processing.

Any input is greatly appreciated! Cheers, Tris

解决方案

grep -A1 -f query_list.txt initial_file.fasta | sed '/^--/d' > final.fasta

or

grep -A1 -f query_list.txt initial_file.fasta | grep -v '^--' > final.fasta

这篇关于Grep输出增加了额外的破折号和换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆