在awk中,搜索CURENT线一定的列 [英] In awk, search for some certain columns of a curent line
问题描述
我有一个名为 c_FROM_V_273_008245_50_neighbours_SYMREMO.out
,看起来像:
非等效原子NEIGHBORSN =表示邻居数目距离r
氢原子n R / ANG R / AU NEIGHBORS(ATOM标签及细胞指数)
1 CA 1 2.4055 4.5458 7 O 0 0 0
1 CA 1 2.4058 4.5463 10 O 0-1 0
1 CA 1 2.4356 4.6026 14 O 0 0 0
。
。
。
如果我想在搜索距离 R / ANG
为 1 CA 7 O
,这将是 2.4055
我创造了这个脚本: search_for_distance.awk
{如果($ 0〜非等效原子邻居){FLAG = 1}};
#如果该文件的当前行以该字符串开头,我们ASIGN它标志= 1 {如果(FLAG == 1)
{如果($ 0〜^ 1 CA){LINE = $ 0;
出口}
}
};
#我在这里每行搜索CA 1 END {VOL =文件名;
#文件名是:c_FROM_V_273_008245_50_neighbours_SYMREMO.out
#我的本意是用2列的新文件,以结束:
#量和距离。
#注意文件名中包含音量:273.008245 GSUB(^ * _ V _。,,VOL);
GSUB(_,VOL。);
GSUB(。50.neighbours.SYMREMO.out,,VOL);
#有些换人做c_FROM_V_273_008245_50_neighbours_SYMREMO.out
#是273.008245 #截至目前运行的输出:
#search_for_distance.awk -f c_FROM_V_273_008245_50_neighbours_SYMREMO.out
#如下: #273.008245 1 CA 1 2.4055 4.5458 7 O 0 0 0 #所以,我需要采取LINE和只能提取列4。
#这是由拆分命令来完成: {分割(LINE,数组,)} 打印VOL,数组[4]}
运行的输出: search_for_distance.awk -f c_FROM_V_273_008245_50_neighbours_SYMREMO.out
如下:
273.008245 2.4055
注意,脚本印刷 1 CA
,这恰好是 1 CA 70
1日亮相,这就是我想要的。
但现在我需要为搜索第一appearence许多距离运行此...
我想搜索 1 CA
14 O
距离的首次亮相。
我只需要修改code,其中我从行开始搜索以 1 CA
的第一位:
{如果($ 0〜非等效原子邻居){FLAG = 1}};
#如果当前行以该字符串开头,我们ASIGN它标志= 1 {如果(FLAG == 1)
{如果($ 0〜^ 1 CA){LINE = $ 0;
出口}
}
};
我怎么能引入一个以搜索 1 CA
14 O
?
像
的东西 {如果(FLAG == 1)
{如果($ 0〜/ 1 CA&安培;&安培; / 14 O){LINE = $ 0;
出口}
}
};
非常感谢你的帮助。
我要搜索中的R / ANG CA 7 O的距离为1,在这种情况下是2.4055
块引用>$的awk'$ 1 == 1安培;&安培; $ 2 ==CA&放大器;&安培; $ 6个== 7和;&安培; $ 7 ==O{打印$ 4}'文件
2.4055要找到R /盎1 CA 14 O:
$的awk'$ 1 == 1安培;&安培; $ 2 ==CA&放大器;&安培; $ 6个== 14安培;&安培; $ 7 ==O{打印$ 4}'文件
2.4356工作原理
$ 1 == 1安培;&安培; $ 2 ==CA&放大器;&安培; $ 6个== 7和;&安培; $ 8个== 0
这对于选择这四个规定条件为真行。
打印$ 4'/ code>
对于选定的线路,这将打印第四场。
I have a file named
c_FROM_V_273_008245_50_neighbours_SYMREMO.out
that looks like:NEIGHBORS OF THE NON-EQUIVALENT ATOMS N = NUMBER OF NEIGHBORS AT DISTANCE R ATOM N R/ANG R/AU NEIGHBORS (ATOM LABELS AND CELL INDICES) 1 CA 1 2.4055 4.5458 7 O 0 0 0 1 CA 1 2.4058 4.5463 10 O 0-1 0 1 CA 1 2.4356 4.6026 14 O 0 0 0 . . .
If I wanted to search for the distance in
R/ANG
for1 CA 7 O
, it would be2.4055
I have created this script: search_for_distance.awk:
{if ($0 ~ "NEIGHBORS OF THE NON-EQUIVALENT ATOMS") {FLAG=1}}; # If the current line of the file begins with that string, we asign it a FLAG=1 {if (FLAG==1) {if ($0 ~ "^ 1 CA"){LINE=$0; exit} } }; # Here I am searching for "1 CA" on each line END{VOL=FILENAME; # The filename is: "c_FROM_V_273_008245_50_neighbours_SYMREMO.out" # My intention is to end up with a new file with 2 columns: # "volume" and "distance". # Notice that the filename contains the volume: 273.008245 gsub("^.*_V_","",VOL); gsub("_",".",VOL); gsub(".50.neighbours.SYMREMO.out"," ",VOL); # Some substitutions to make "c_FROM_V_273_008245_50_neighbours_SYMREMO.out" # to be "273.008245" # Up to now the output of running: # search_for_distance.awk -f c_FROM_V_273_008245_50_neighbours_SYMREMO.out # is the following: # 273.008245 1 CA 1 2.4055 4.5458 7 O 0 0 0 # So, I need to take LINE and only extract column "4". # This is done by a "split" command: {split(LINE,array," ")} print VOL,array[4]}
the output of running:
search_for_distance.awk -f c_FROM_V_273_008245_50_neighbours_SYMREMO.out
is the following:273.008245 2.4055
Notice that the script is printing the 1st appearance of
1 CA
, which happens to be1 CA 7O
, which is what I wanted.But now I need to run this for searching the first appearence many distances...
I would like to search for the first appearance of the
1 CA
14 O
distance. I would only have to modify the first bit of the code where I am searching from the beginning of the line to1 CA
:{if ($0 ~ "NEIGHBORS OF THE NON-EQUIVALENT ATOMS") {FLAG=1}}; # If the current line begins with that string, we asign it a FLAG=1 {if (FLAG==1) {if ($0 ~ "^ 1 CA"){LINE=$0; exit} } };
How could I introduce an order to search for
1 CA
14 O
?Something like
{if (FLAG==1) {if ($0 ~ "/1 CA && /14 O"){LINE=$0; exit} } };
Thank you very much for your help
解决方案I want to search for the distance in R/ANG for 1 CA 7 O, which in this case is 2.4055
$ awk '$1==1 && $2=="CA" && $6==7 && $7=="O" {print $4}' file 2.4055
To find R/Ang for 1 CA 14 O:
$ awk '$1==1 && $2=="CA" && $6==14 && $7=="O" {print $4}' file 2.4356
How it works
$1==1 && $2=="CA" && $6==7 && $8==0
This selects lines for which the four stated conditions are true.
print $4
For the selected lines, this prints the fourth field.
这篇关于在awk中,搜索CURENT线一定的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!