在awk中,搜索CURENT线一定的列 [英] In awk, search for some certain columns of a curent line

查看:120
本文介绍了在awk中,搜索CURENT线一定的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 c_FROM_V_273_008245_50_neighbours_SYMREMO.out ,看起来像:

 非等效原子NEIGHBORSN =表示邻居数目距离r
氢原子n R / ANG R / AU NEIGHBORS(ATOM标签及细胞指数)
1 CA 1 2.4055 4.5458 7 O 0 0 0
1 CA 1 2.4058 4.5463 10 O 0-1 0
1 CA 1 2.4356 4.6026 14 O 0 0 0



如果我想在搜索距离 R / ANG 1 CA 7 O ,这将是 2.4055

我创造了这个脚本: search_for_distance.awk

  {如果($ 0〜非等效原子邻居){FLAG = 1}};
 #如果该文件的当前行以该字符串开头,我们ASIGN它标志= 1    {如果(FLAG == 1)
            {如果($ 0〜^ 1 CA){LINE = $ 0;
            出口}
            }
    };
    #我在这里每行搜索CA 1 END {VOL​​ =文件名;
 #文件名是:c_FROM_V_273_008245_50_neighbours_SYMREMO.out
 #我的本意是用2列的新文件,以结束:
 #量和距离。
 #注意文件名中包含音量:273.008245 GSUB(^ * _ V _。,,VOL);
 GSUB(_,VOL。);
 GSUB(。50.neighbours.SYMREMO.out,,VOL);
 #有些换人做c_FROM_V_273_008245_50_neighbours_SYMREMO.out
 #是273.008245 #截至目前运行的输出:
 #search_for_distance.awk -f c_FROM_V_273_008245_50_neighbours_SYMREMO.out
 #如下: #273.008245 1 CA 1 2.4055 4.5458 7 O 0 0 0 #所以,我需要采取LINE和只能提取列4。
 #这是由拆分命令来完成: {分割(LINE,数组,)} 打印VOL,数组[4]}

运行的输出:
search_for_distance.awk -f c_FROM_V_273_008245_50_neighbours_SYMREMO.out
如下:

  273.008245 2.4055

注意,脚本印刷 1 CA ,这恰好是 1 CA 70 1日亮相,这就是我想要的。

但现在我需要为搜索第一appearence许多距离运行此...

我想搜索 1 CA 14 O 距离的首次亮相。
我只需要修改code,其中我从行开始搜索以 1 CA 的第一位:

  {如果($ 0〜非等效原子邻居){FLAG = 1}};
 #如果当前行以该字符串开头,我们ASIGN它标志= 1    {如果(FLAG == 1)
            {如果($ 0〜^ 1 CA){LINE = $ 0;
            出口}
            }
    };

我怎么能引入一个以搜索 1 CA 14 O

的东西

  {如果(FLAG == 1)
            {如果($ 0〜/ 1 CA&安培;&安培; / 14 O){LINE = $ 0;
            出口}
            }
    };

非常感谢你的帮助。


解决方案

  

我要搜索中的R / ANG CA 7 O的距离为1,在这种情况下是2.4055


  $的awk'$ 1 == 1安培;&安培; $ 2 ==CA&放大器;&安培; $ 6个== 7和;&安培; $ 7 ==O{打印$ 4}'文件
2.4055

要找到R /盎1 CA 14 O:

  $的awk'$ 1 == 1安培;&安培; $ 2 ==CA&放大器;&安培; $ 6个== 14安培;&安培; $ 7 ==O{打印$ 4}'文件
2.4356

工作原理


  • $ 1 == 1安培;&安培; $ 2 ==CA&放大器;&安培; $ 6个== 7和;&安培; $ 8个== 0

    这对于选择这四个规定条件为真行。


  • 打印$ 4'/ code>

    对于选定的线路,这将打印第四场。


I have a file named c_FROM_V_273_008245_50_neighbours_SYMREMO.out that looks like:

NEIGHBORS OF THE NON-EQUIVALENT ATOMS

N = NUMBER OF NEIGHBORS AT DISTANCE R
ATOM  N     R/ANG      R/AU   NEIGHBORS (ATOM LABELS AND CELL INDICES)
1 CA   1     2.4055     4.5458    7 O    0 0 0
1 CA   1     2.4058     4.5463   10 O    0-1 0
1 CA   1     2.4356     4.6026   14 O    0 0 0
.
.
.

If I wanted to search for the distance in R/ANG for 1 CA 7 O, it would be 2.4055

I have created this script: search_for_distance.awk:

 {if ($0 ~ "NEIGHBORS OF THE NON-EQUIVALENT ATOMS") {FLAG=1}};
 # If the current line of the file begins with that string, we asign it a FLAG=1

    {if (FLAG==1)
            {if ($0 ~ "^   1 CA"){LINE=$0;
            exit}
            }
    };
    # Here I am searching for "1 CA" on each line

 END{VOL=FILENAME;
 # The filename is: "c_FROM_V_273_008245_50_neighbours_SYMREMO.out"
 # My intention is to end up with a new file with 2 columns:
 # "volume" and "distance". 
 # Notice that the filename contains the volume: 273.008245

 gsub("^.*_V_","",VOL);
 gsub("_",".",VOL);
 gsub(".50.neighbours.SYMREMO.out"," ",VOL);
 # Some substitutions to make "c_FROM_V_273_008245_50_neighbours_SYMREMO.out" 
 # to be "273.008245"

 # Up to now the output of running: 
 # search_for_distance.awk -f c_FROM_V_273_008245_50_neighbours_SYMREMO.out
 # is the following:

 # 273.008245     1 CA   1     2.4055     4.5458    7 O    0 0 0

 # So, I need to take LINE and only extract column "4".
 # This is done by a "split" command:

 {split(LINE,array," ")}   

 print VOL,array[4]}

the output of running: search_for_distance.awk -f c_FROM_V_273_008245_50_neighbours_SYMREMO.out is the following:

 273.008245  2.4055

Notice that the script is printing the 1st appearance of 1 CA, which happens to be 1 CA 7O, which is what I wanted.

But now I need to run this for searching the first appearence many distances...

I would like to search for the first appearance of the 1 CA 14 O distance. I would only have to modify the first bit of the code where I am searching from the beginning of the line to 1 CA:

 {if ($0 ~ "NEIGHBORS OF THE NON-EQUIVALENT ATOMS") {FLAG=1}};
 # If the current line begins with that string, we asign it a FLAG=1

    {if (FLAG==1)
            {if ($0 ~ "^   1 CA"){LINE=$0;
            exit}
            }
    };

How could I introduce an order to search for 1 CA 14 O?

Something like

    {if (FLAG==1)
            {if ($0 ~ "/1 CA   && /14 O"){LINE=$0;
            exit}
            }
    };

Thank you very much for your help

解决方案

I want to search for the distance in R/ANG for 1 CA 7 O, which in this case is 2.4055

$ awk '$1==1 && $2=="CA" && $6==7 && $7=="O" {print $4}' file
2.4055

To find R/Ang for 1 CA 14 O:

$ awk '$1==1 && $2=="CA" && $6==14 && $7=="O" {print $4}' file
2.4356

How it works

  • $1==1 && $2=="CA" && $6==7 && $8==0

    This selects lines for which the four stated conditions are true.

  • print $4

    For the selected lines, this prints the fourth field.

这篇关于在awk中,搜索CURENT线一定的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆