将awk与两个不同的定界符一起使用以拆分和选择列 [英] Use awk with two different delimiters to split and select columns

查看:96
本文介绍了将awk与两个不同的定界符一起使用以拆分和选择列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何告诉gawk使用两个不同的定界符,以便我可以分隔某些列,但使用文件的制表符分隔格式选择其他列?

How can I tell gawk to use two different delimiters so that I can separate some columns, but select others using the tab-delimited format of my file?

> cat broad_snps.tab

  chrsnpID  rsID    freq_bin    snp_maf gene_count  dist_nearest_gene_snpsnap   dist_nearest_gene_snpsnap_protein_coding    dist_nearest_gene   dist_nearest_gene_located_within    loci_upstream   loci_downstream ID_nearest_gene_snpsnap ID_nearest_gene_snpsnap_protein_coding  ID_nearest_gene ID_nearest_gene_located_within  HGNC_nearest_gene_snpsnap   HGNC_nearest_gene_snpsnap_protein_coding    flag_snp_within_gene    flag_snp_within_gene_protein_coding ID_genes_in_matched_locus   friends_ld01    friends_ld02    friends_ld03    friends_ld04    friends_ld05    friends_ld06    friends_ld07    friends_ld08    friends_ld09    -1    
  10:10001753   10:10001753 7   0.07455 0   98932.0 1045506.0   98932.0 inf 9986766 10039928    ENSG00000224788 ENSG00000048740 ENSG00000224788         CELF2   False   False       253.0   103.0   55.0    40.0    35.0    33.031.0    20.0    0.0 -1  
  10:10001794   10:10001794 41  0.4105  0   98891.0 1045465.0   98891.0 inf 9964948 10071879    ENSG00000224788 ENSG00000048740 ENSG00000224788         CELF2   False   False       365.0   299.0   294.0   266.0   168.0   138.58.0    45.0    0.0 -1  
  10:100023489  10:100023489    10  0.1054  1   4518.0  4518.0  4518.0  4518.0  100023489   100023489   ENSG00000138131 ENSG00000138131 ENSG00000138131 ENSG00000138131 LOXL4   LOXL4   True    True    ENSG00000138131 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1  
  10:100025128  10:100025128    45  0.4543  1   2879.0  2879.0  2879.0  2879.0  100025128   100025128   ENSG00000138131 ENSG00000138131 ENSG00000138131 ENSG00000138131 LOXL4   LOXL4   True    True    ENSG00000138131 112.0   70.0    3.0 0.0 0.0    

我想要的输出:

chr10   10001752    10001753    CELF2
chr10   10001793    10001794    CELF2
chr10   100023488   100023489   LOXL4
chr10   100025127   100025128   LOXL4
chr10   10002974    10002975    LOXL4

我当前正在使用的命令:

The command I am currently using:

cat broad_snps.tab | tail -n+2 |  gawk -vOFS="\t" -vFS=":" '{ print "chr"$1, ($2 - 1), $2}' | gawk -vOFS="\t" '{print $1, $2, $3}' > broad_SNPs.bed

返回此:

chr10   10001752    10001753    10
chr10   10001793    10001794    10
chr10   100023488   100023489   10
chr10   100025127   100025128   10
chr10   10002974    10002975    10
chr10   10003391    10003392    10
chr10   100038815   100038816   10
chr10   10008001    10008002    10
chr10   100093012   100093013   10

我希望能够使用:"分隔符来拆分第一列,但是我需要使用"\ t"来选择基因ID.

I'd like to be able to use the ":" delimiter to split up the first column, but I need to use "\t" to pick out the gene ID.

谢谢!

推荐答案

awk -F'[\t:]' '{print $1, $2, $4, $17}'

这篇关于将awk与两个不同的定界符一起使用以拆分和选择列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆