仅在特定列之间更改分隔符 [英] Change separator just between specific columns

查看:57
本文介绍了仅在特定列之间更改分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试更改第 1 列和第 9 列之间的分隔符.之后,我想保留原始分隔符.

I am trying to change the separator just between columns 1 and 9. After that, I would like to maintain the original separator.

那些是直接读取文件和执行 od -c file 时我的文件的第一行:

Those are first lines of my file both when directly reading it and when od -c file is executed:

#description: evidence-based annotation of the human genome (GRCh38), version 35 (Ensembl 101), mapped to GRCh37 with gencode-backmap
#provider: GENCODE
#contact: gencode-help@ebi.ac.uk
#format: gff3
#date: 2020-06-03
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5_4"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.2_4"; remap_status "full_contig"; remap_num_mappings 1; remap_target_status "overlap";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5_4"; transcript_id "ENST00000456328.2_1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; level 2; transcript_support_level 1; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2_4"; havana_transcript "OTTHUMT00000362751.1_1"; remap_num_mappings 1; remap_status "full_contig"; remap_target_status "overlap";

0000000   #   d   e   s   c   r   i   p   t   i   o   n   :       e   v
0000020   i   d   e   n   c   e   -   b   a   s   e   d       a   n   n
0000040   o   t   a   t   i   o   n       o   f       t   h   e       h
0000060   u   m   a   n       g   e   n   o   m   e       (   G   R   C
0000100   h   3   8   )   ,       v   e   r   s   i   o   n       3   5
0000120       (   E   n   s   e   m   b   l       1   0   1   )   ,
0000140   m   a   p   p   e   d       t   o       G   R   C   h   3   7
0000160       w   i   t   h       g   e   n   c   o   d   e   -   b   a
0000200   c   k   m   a   p  \n   #   p   r   o   v   i   d   e   r   :
0000220       G   E   N   C   O   D   E  \n   #   c   o   n   t   a   c
0000240   t   :       g   e   n   c   o   d   e   -   h   e   l   p   @
0000260   e   b   i   .   a   c   .   u   k  \n   #   f   o   r   m   a
0000300   t   :       g   f   f   3  \n   #   d   a   t   e   :       2
0000320   0   2   0   -   0   6   -   0   3  \n   c   h   r   1       H
0000340   A   V   A   N   A       g   e   n   e       1   1   8   6   9
0000360       1   4   4   0   9       .       +       .       g   e   n
0000400   e   _   i   d       "   E   N   S   G   0   0   0   0   0   2
0000420   2   3   9   7   2   .   5   _   4   "   ;       g   e   n   e
0000440   _   t   y   p   e       "   t   r   a   n   s   c   r   i   b
0000460   e   d   _   u   n   p   r   o   c   e   s   s   e   d   _   p
0000500   s   e   u   d   o   g   e   n   e   "   ;       g   e   n   e
0000520   _   n   a   m   e       "   D   D   X   1   1   L   1   "   ;
0000540       l   e   v   e   l       2   ;       h   g   n   c   _   i
0000560   d       "   H   G   N   C   :   3   7   1   0   2   "   ;
0000600   h   a   v   a   n   a   _   g   e   n   e       "   O   T   T
0000620   H   U   M   G   0   0   0   0   0   0   0   0   9   6   1   .
0000640   2   _   4   "   ;       r   e   m   a   p   _   s   t   a   t
0000660   u   s       "   f   u   l   l   _   c   o   n   t   i   g   "
0000700   ;       r   e   m   a   p   _   n   u   m   _   m   a   p   p
0000720   i   n   g   s       1   ;       r   e   m   a   p   _   t   a
0000740   r   g   e   t   _   s   t   a   t   u   s       "   o   v   e
0000760   r   l   a   p   "   ;  \n   c   h   r   1       H   A   V   A
0001000   N   A       t   r   a   n   s   c   r   i   p   t       1   1
0001020   8   6   9       1   4   4   0   9       .       +       .    
0001040   g   e   n   e   _   i   d       "   E   N   S   G   0   0   0
0001060   0   0   2   2   3   9   7   2   .   5   _   4   "   ;       t
0001100   r   a   n   s   c   r   i   p   t   _   i   d       "   E   N
0001120   S   T   0   0   0   0   0   4   5   6   3   2   8   .   2   _
0001140   1   "   ;       g   e   n   e   _   t   y   p   e       "   t
0001160   r   a   n   s   c   r   i   b   e   d   _   u   n   p   r   o
0001200   c   e   s   s   e   d   _   p   s   e   u   d   o   g   e   n
0001220   e   "   ;       g   e   n   e   _   n   a   m   e       "   D
0001240   D   X   1   1   L   1   "   ;       t   r   a   n   s   c   r
0001260   i   p   t   _   t   y   p   e       "   p   r   o   c   e   s
0001300   s   e   d   _   t   r   a   n   s   c   r   i   p   t   "   ;
0001320       t   r   a   n   s   c   r   i   p   t   _   n   a   m   e
0001340       "   D   D   X   1   1   L   1   -   2   0   2   "   ;
0001360   l   e   v   e   l       2   ;       t   r   a   n   s   c   r
0001400   i   p   t   _   s   u   p   p   o   r   t   _   l   e   v   e

如何将其转换为:

#description: evidence-based annotation of the human genome (GRCh38), version 35 (Ensembl 101), mapped to GRCh37 with gencode-backmap
#provider: GENCODE
#contact: gencode-help@ebi.ac.uk
#format: gff3
#date: 2020-06-03
chr1    HAVANA  gene    11869   14409   .       +       .       gene_id "ENSG00000223972.5_4"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37 102"; havana_gene "OTTHUMG00000000961.2_4"; remap_status "full_contig"; remap_num_mappings 1; remap_target_status "overlap";
chr1    HAVANA  transcript      11869   14409   .       +       .       gene_id "ENSG00000223972.5_4"; transcript_id "ENST00000456328.2_1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; level 2; transcript_support_level 1; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2_4"; havana_transcript "OTTHUMT00000362751.1_1"; remap_num_mappings 1; remap_status "full_contig"; remap_target_status "overlap";

0000000   #   d   e   s   c   r   i   p   t   i   o   n   :       e   v
0000020   i   d   e   n   c   e   -   b   a   s   e   d       a   n   n
0000040   o   t   a   t   i   o   n       o   f       t   h   e       h
0000060   u   m   a   n       g   e   n   o   m   e       (   G   R   C
0000100   h   3   8   )   ,       v   e   r   s   i   o   n       3   5
0000120       (   E   n   s   e   m   b   l       1   0   1   )   ,
0000140   m   a   p   p   e   d       t   o       G   R   C   h   3   7
0000160       w   i   t   h       g   e   n   c   o   d   e   -   b   a
0000200   c   k   m   a   p  \n   #   p   r   o   v   i   d   e   r   :
0000220       G   E   N   C   O   D   E  \n   #   c   o   n   t   a   c
0000240   t   :       g   e   n   c   o   d   e   -   h   e   l   p   @
0000260   e   b   i   .   a   c   .   u   k  \n   #   f   o   r   m   a
0000300   t   :       g   f   f   3  \n   #   d   a   t   e   :       2
0000320   0   2   0   -   0   6   -   0   3  \n   c   h   r   1  \t   H
0000340   A   V   A   N   A  \t   g   e   n   e  \t   1   1   8   6   9
0000360  \t   1   4   4   0   9  \t   .  \t   +  \t   .  \t   g   e   n
0000400   e   _   i   d       "   E   N   S   G   0   0   0   0   0   2
0000420   2   3   9   7   2   .   5   _   4   "   ;       g   e   n   e
0000440   _   t   y   p   e       "   t   r   a   n   s   c   r   i   b
0000460   e   d   _   u   n   p   r   o   c   e   s   s   e   d   _   p
0000500   s   e   u   d   o   g   e   n   e   "   ;       g   e   n   e
0000520   _   n   a   m   e       "   D   D   X   1   1   L   1   "   ;
0000540       l   e   v   e   l       2   ;       h   g   n   c   _   i
0000560   d       "   H   G   N   C   :   3   7   1   0   2   "   ;
0000600   h   a   v   a   n   a   _   g   e   n   e       "   O   T   T
0000620   H   U   M   G   0   0   0   0   0   0   0   0   9   6   1   .
0000640   2   _   4   "   ;       r   e   m   a   p   _   s   t   a   t
0000660   u   s       "   f   u   l   l   _   c   o   n   t   i   g   "
0000700   ;       r   e   m   a   p   _   n   u   m   _   m   a   p   p
0000720   i   n   g   s       1   ;       r   e   m   a   p   _   t   a
0000740   r   g   e   t   _   s   t   a   t   u   s       "   o   v   e
0000760   r   l   a   p   "   ;  \n   c   h   r   1  \t   H   A   V   A
0001000   N   A  \t   t   r   a   n   s   c   r   i   p   t  \t   1   1
0001020   8   6   9  \t   1   4   4   0   9  \t   .  \t   +  \t   .  \t
0001040   g   e   n   e   _   i   d       "   E   N   S   G   0   0   0
0001060   0   0   2   2   3   9   7   2   .   5   _   4   "   ;       t
0001100   r   a   n   s   c   r   i   p   t   _   i   d       "   E   N
0001120   S   T   0   0   0   0   0   4   5   6   3   2   8   .   2   _
0001140   1   "   ;       g   e   n   e   _   t   y   p   e       "   t
0001160   r   a   n   s   c   r   i   b   e   d   _   u   n   p   r   o
0001200   c   e   s   s   e   d   _   p   s   e   u   d   o   g   e   n
0001220   e   "   ;       g   e   n   e   _   n   a   m   e       "   D
0001240   D   X   1   1   L   1   "   ;       t   r   a   n   s   c   r
0001260   i   p   t   _   t   y   p   e       "   p   r   o   c   e   s
0001300   s   e   d   _   t   r   a   n   s   c   r   i   p   t   "   ;
0001320       t   r   a   n   s   c   r   i   p   t   _   n   a   m   e
0001340       "   D   D   X   1   1   L   1   -   2   0   2   "   ;
0001360   l   e   v   e   l       2   ;       t   r   a   n   s   c   r
0001400   i   p   t   _   s   u   p   p   o   r   t   _   l   e   v   e

如您所见,我想保留一个完全相同的标题.之后,我只想将前 9 列分开.如果我这样做,在 9 选项卡之后,测试的其余部分将成为第一列的一部分.

As you can see, there is a header which I want to maintain exactly the same. After that, I just want to tab separate the first 9 columns. If I do that, after the 9 tab, the rest of the test is going to be part of the first column.

谢谢!

推荐答案

默认情况下 sed s/.../.../ 仅替换第一次出现.因此,您可以重复此替换 8 次.在这里,我们也忽略了以 # 开头的行.

By default sed s/.../.../ replaces only the first occurrence. Therefore you can repeat this substitution 8 times. Here, we also ignore lines starting with #.

在 bash 中,可以使用大括号扩展 {1..8}printf 进行重复.

In bash, repeating can be done by using the brace expansion {1..8} and printf.

printf -v cmd 's/ /\\t/;%.0s' {1..8}
sed '/^#/!'"{$cmd}" yourFile

对于普通的 sh 使用循环或长版本

For plain sh use a loop or the long version

sed '/^#/!{s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;}' yourFile

这篇关于仅在特定列之间更改分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆