在awk中修改文本文件 [英] modifying the text file in awk

查看：774 发布时间：2020/9/15 6:43:30 awk

本文介绍了在awk中修改文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文本文件，如下例所示:

I have a text file like the following small example:

chr1    HAVANA  transcript  12010   13670   .   +   .   gene_id "ENSG00000223972.4"; transcript_id "ENST00000450305.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; tr
anscript_type "transcribed_unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-001"; level 2; ont "PGO:0000005"; ont "PGO:0000019"; havana_gene "OTTHUMG00000000961.2"; havana_tran
script "OTTHUMT00000002844.2";
chr2    HAVANA  exon    53  955 .   +   .   gene_id "ENSG00000223972.4"; transcript_id "ENST00000450305.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript
_type "transcribed_unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-001"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; ont "PGO:0000005"; ont "PGO:0000019"; havana_gene
 "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";

这个小例子的预期输出是:

the expected output for the small example is:

chr1    HAVANA  transcript  11998   12060   .   +   .   gene_id "ENSG00000223972.4"; transcript_id "ENST00000450305.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; tr
anscript_type "transcribed_unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-001"; level 2; ont "PGO:0000005"; ont "PGO:0000019"; havana_gene "OTTHUMG00000000961.2"; havana_tran
script "OTTHUMT00000002844.2";
chr2    HAVANA  exon    41  103 .   +   .   gene_id "ENSG00000223972.4"; transcript_id "ENST00000450305.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript
_type "transcribed_unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-001"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; ont "PGO:0000005"; ont "PGO:0000019"; havana_gene
 "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";

输入文件中的

有不同的行.每行以chr开头.每行都有一些列，分隔符可以是制表符或;". 我要从中创建一个新文件，其中仅第4列和第5列会有变化.实际上，新文件中的第4列为((column 4 in original file)-12)，而新文件中的第5列为((column 4 in original file)+50).输入文件和输出文件之间唯一的区别在于第4和第5列中的数字. 我尝试使用以下命令在awk中做到这一点:

in the input file, there are different lines. each line starts with chr. every line has some columns and separators are either tab or ";". I want to make a new file from this one in which there would be a change only in columns 4 and 5. in fact column 4 in the new file would be ((column 4 in original file)-12) and 5th column in the new file would be ((column 4 in original file)+50). the only difference between input file and output file in the numbers in 4th and 5th column. I tried to do that in awk using the following command:

awk 'BEGIN { FS="\t;" } {print  $1"\t"$2"\t"$3"\t"$4=$4-12"\t"$5=$4+50"\t"$6"\t"$7"\t"$8"\t"$9" "$10";"$11" "$12";"$13" "$14";"$15" "$16";"$17" "$18";"$19" "$20";"$21" "$22";"$23" "$24";"$25" "$26";"$27" "$28";"$29" "$30";"$31" "$32";"$33" "$34";"$35" "$36";"$37" "$38";" }' input.txt > test2.txt

当我运行代码时，它将返回此错误:

when I run the code, it would return this error:

awk: cmd. line:1: BEGIN { FS="\t;" } {print  $1"\t"$2"\t"$3"\t"$4=$4-12"\t"$5=$4+50"\t"$6"\t"$7"\t"$8"\t"$9" "$10";"$11" "$12";"$13" "$14";"$15" "$16";"$17" "$18";"$19" "$20";"$21" "$22";"$23" "$24";"$25" "$26";"$27" "$28";"$29" "$30";"$31" "$32 ";" $33" "$34";"$35" "$36";"$37" "$38";" }
awk: cmd. line:1:                                                                                                                                                                                                                         ^ syntax error
awk: cmd. line:1: BEGIN { FS="\t;" } {print  $1"\t"$2"\t"$3"\t"$4=$4-12"\t"$5=$4+50"\t"$6"\t"$7"\t"$8"\t"$9" "$10";"$11" "$12";"$13" "$14";"$15" "$16";"$17" "$18";"$19" "$20";"$21" "$22";"$23" "$24";"$25" "$26";"$27" "$28";"$29" "$30";"$31" "$32 ";" $33" "$34";"$35" "$36";"$37" "$38";" }
awk: cmd. line:1:                                                                                                                                                                                                                                      ^ syntax error

您知道如何解决它吗?我想获得一个与输入文件格式完全相同的输出文件.含义相同的定界符.

do you know how to fix it? I want to get the an output file with exactly the same format as input file. meaning the same delimiters.

在awk中修改文本文件 [英] modifying the text file in awk

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在awk中修改文本文件 [英] modifying the text file in awk

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭