仅在特定列之间更改分隔符 [英] Change separator just between specific columns
问题描述
我正在尝试更改第 1 列和第 9 列之间的分隔符.之后,我想保留原始分隔符.
I am trying to change the separator just between columns 1 and 9. After that, I would like to maintain the original separator.
那些是直接读取文件和执行 od -c file
时我的文件的第一行:
Those are first lines of my file both when directly reading it and when od -c file
is executed:
#description: evidence-based annotation of the human genome (GRCh38), version 35 (Ensembl 101), mapped to GRCh37 with gencode-backmap
#provider: GENCODE
#contact: gencode-help@ebi.ac.uk
#format: gff3
#date: 2020-06-03
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5_4"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.2_4"; remap_status "full_contig"; remap_num_mappings 1; remap_target_status "overlap";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5_4"; transcript_id "ENST00000456328.2_1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; level 2; transcript_support_level 1; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2_4"; havana_transcript "OTTHUMT00000362751.1_1"; remap_num_mappings 1; remap_status "full_contig"; remap_target_status "overlap";
0000000 # d e s c r i p t i o n : e v
0000020 i d e n c e - b a s e d a n n
0000040 o t a t i o n o f t h e h
0000060 u m a n g e n o m e ( G R C
0000100 h 3 8 ) , v e r s i o n 3 5
0000120 ( E n s e m b l 1 0 1 ) ,
0000140 m a p p e d t o G R C h 3 7
0000160 w i t h g e n c o d e - b a
0000200 c k m a p \n # p r o v i d e r :
0000220 G E N C O D E \n # c o n t a c
0000240 t : g e n c o d e - h e l p @
0000260 e b i . a c . u k \n # f o r m a
0000300 t : g f f 3 \n # d a t e : 2
0000320 0 2 0 - 0 6 - 0 3 \n c h r 1 H
0000340 A V A N A g e n e 1 1 8 6 9
0000360 1 4 4 0 9 . + . g e n
0000400 e _ i d " E N S G 0 0 0 0 0 2
0000420 2 3 9 7 2 . 5 _ 4 " ; g e n e
0000440 _ t y p e " t r a n s c r i b
0000460 e d _ u n p r o c e s s e d _ p
0000500 s e u d o g e n e " ; g e n e
0000520 _ n a m e " D D X 1 1 L 1 " ;
0000540 l e v e l 2 ; h g n c _ i
0000560 d " H G N C : 3 7 1 0 2 " ;
0000600 h a v a n a _ g e n e " O T T
0000620 H U M G 0 0 0 0 0 0 0 0 9 6 1 .
0000640 2 _ 4 " ; r e m a p _ s t a t
0000660 u s " f u l l _ c o n t i g "
0000700 ; r e m a p _ n u m _ m a p p
0000720 i n g s 1 ; r e m a p _ t a
0000740 r g e t _ s t a t u s " o v e
0000760 r l a p " ; \n c h r 1 H A V A
0001000 N A t r a n s c r i p t 1 1
0001020 8 6 9 1 4 4 0 9 . + .
0001040 g e n e _ i d " E N S G 0 0 0
0001060 0 0 2 2 3 9 7 2 . 5 _ 4 " ; t
0001100 r a n s c r i p t _ i d " E N
0001120 S T 0 0 0 0 0 4 5 6 3 2 8 . 2 _
0001140 1 " ; g e n e _ t y p e " t
0001160 r a n s c r i b e d _ u n p r o
0001200 c e s s e d _ p s e u d o g e n
0001220 e " ; g e n e _ n a m e " D
0001240 D X 1 1 L 1 " ; t r a n s c r
0001260 i p t _ t y p e " p r o c e s
0001300 s e d _ t r a n s c r i p t " ;
0001320 t r a n s c r i p t _ n a m e
0001340 " D D X 1 1 L 1 - 2 0 2 " ;
0001360 l e v e l 2 ; t r a n s c r
0001400 i p t _ s u p p o r t _ l e v e
如何将其转换为:
#description: evidence-based annotation of the human genome (GRCh38), version 35 (Ensembl 101), mapped to GRCh37 with gencode-backmap
#provider: GENCODE
#contact: gencode-help@ebi.ac.uk
#format: gff3
#date: 2020-06-03
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5_4"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37 102"; havana_gene "OTTHUMG00000000961.2_4"; remap_status "full_contig"; remap_num_mappings 1; remap_target_status "overlap";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5_4"; transcript_id "ENST00000456328.2_1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; level 2; transcript_support_level 1; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2_4"; havana_transcript "OTTHUMT00000362751.1_1"; remap_num_mappings 1; remap_status "full_contig"; remap_target_status "overlap";
0000000 # d e s c r i p t i o n : e v
0000020 i d e n c e - b a s e d a n n
0000040 o t a t i o n o f t h e h
0000060 u m a n g e n o m e ( G R C
0000100 h 3 8 ) , v e r s i o n 3 5
0000120 ( E n s e m b l 1 0 1 ) ,
0000140 m a p p e d t o G R C h 3 7
0000160 w i t h g e n c o d e - b a
0000200 c k m a p \n # p r o v i d e r :
0000220 G E N C O D E \n # c o n t a c
0000240 t : g e n c o d e - h e l p @
0000260 e b i . a c . u k \n # f o r m a
0000300 t : g f f 3 \n # d a t e : 2
0000320 0 2 0 - 0 6 - 0 3 \n c h r 1 \t H
0000340 A V A N A \t g e n e \t 1 1 8 6 9
0000360 \t 1 4 4 0 9 \t . \t + \t . \t g e n
0000400 e _ i d " E N S G 0 0 0 0 0 2
0000420 2 3 9 7 2 . 5 _ 4 " ; g e n e
0000440 _ t y p e " t r a n s c r i b
0000460 e d _ u n p r o c e s s e d _ p
0000500 s e u d o g e n e " ; g e n e
0000520 _ n a m e " D D X 1 1 L 1 " ;
0000540 l e v e l 2 ; h g n c _ i
0000560 d " H G N C : 3 7 1 0 2 " ;
0000600 h a v a n a _ g e n e " O T T
0000620 H U M G 0 0 0 0 0 0 0 0 9 6 1 .
0000640 2 _ 4 " ; r e m a p _ s t a t
0000660 u s " f u l l _ c o n t i g "
0000700 ; r e m a p _ n u m _ m a p p
0000720 i n g s 1 ; r e m a p _ t a
0000740 r g e t _ s t a t u s " o v e
0000760 r l a p " ; \n c h r 1 \t H A V A
0001000 N A \t t r a n s c r i p t \t 1 1
0001020 8 6 9 \t 1 4 4 0 9 \t . \t + \t . \t
0001040 g e n e _ i d " E N S G 0 0 0
0001060 0 0 2 2 3 9 7 2 . 5 _ 4 " ; t
0001100 r a n s c r i p t _ i d " E N
0001120 S T 0 0 0 0 0 4 5 6 3 2 8 . 2 _
0001140 1 " ; g e n e _ t y p e " t
0001160 r a n s c r i b e d _ u n p r o
0001200 c e s s e d _ p s e u d o g e n
0001220 e " ; g e n e _ n a m e " D
0001240 D X 1 1 L 1 " ; t r a n s c r
0001260 i p t _ t y p e " p r o c e s
0001300 s e d _ t r a n s c r i p t " ;
0001320 t r a n s c r i p t _ n a m e
0001340 " D D X 1 1 L 1 - 2 0 2 " ;
0001360 l e v e l 2 ; t r a n s c r
0001400 i p t _ s u p p o r t _ l e v e
如您所见,我想保留一个完全相同的标题.之后,我只想将前 9 列分开.如果我这样做,在 9 选项卡之后,测试的其余部分将成为第一列的一部分.
As you can see, there is a header which I want to maintain exactly the same. After that, I just want to tab separate the first 9 columns. If I do that, after the 9 tab, the rest of the test is going to be part of the first column.
谢谢!
推荐答案
默认情况下 sed s/.../.../
仅替换第一次出现.因此,您可以重复此替换 8 次.在这里,我们也忽略了以 #
开头的行.
By default sed s/.../.../
replaces only the first occurrence. Therefore you can repeat this substitution 8 times. Here, we also ignore lines starting with #
.
在 bash 中,可以使用大括号扩展 {1..8}
和 printf
进行重复.
In bash, repeating can be done by using the brace expansion {1..8}
and printf
.
printf -v cmd 's/ /\\t/;%.0s' {1..8}
sed '/^#/!'"{$cmd}" yourFile
对于普通的 sh
使用循环或长版本
For plain sh
use a loop or the long version
sed '/^#/!{s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;s/ /\t/;}' yourFile
这篇关于仅在特定列之间更改分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!