通过awk离散到连续数字范围 [英] Discrete to continuous number ranges via awk

查看:229
本文介绍了通过awk离散到连续数字范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设文本文件 file ,其中包含多个离散数字范围,每行一个。每个范围前面都有一个字符串(即范围名称)。每个范围的下限和上限由短划线分隔。每个数字范围都以分号结束。各个范围已排序(即,范围101-297在1299-1301之前)并且不重叠。

Assume a text file file which contains multiple discrete number ranges, one per line. Each range is preceded by a string (i.e., the range name). The lower and upper bound of each range is separated by a dash. Each number range is succeeded by a semi-colon. The individual ranges are sorted (i.e., range 101-297 comes before 1299-1301) and do not overlap.

$cat file
foo  101-297;
bar  1299-1301;
baz  1314-5266;

请注意,在上面的示例中,三个范围不构成从整数1开始的连续范围。

Please note that in the example above the three ranges do not form a continuous range that starts at integer 1.

我认为 awk 是填补缺失数字范围的合适工具,这样所有范围一起形成一个连续范围{1 }到{最后一个范围的上限}。如果是这样,你会用什么awk命令/函数来执行任务?

I believe that awk is the appropriate tool to fill the missing number ranges such that all ranges taken together form a continuous range from {1} to {upper bound of the last range}. If so, what awk command/function would you use to perform the task?

$cat file | sought_awk_command
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
new3 1302-1313;
baz  1314-5266;

-

编辑1 :经过仔细评估,下面建议的代码在另一个简单示例中失败。

Edit 1: Upon closer evaluation, the code suggested below fails at another simple example.

$cat example2
foo  101-297;
bar  1299-1301;
baz  1302-1314; # Notice that ranges "bar" and "baz" are continuous to one another
qux  1399-5266;

$ awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1' example2
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
baz  1302-1314;
new3 1302-1398; # ERROR HERE: Notice that range "new3" has a lower bound that is equal to upper bound of "bar", not of "baz".
qux  1399-5266;

-

编辑2:非常感谢 RavinderSingh13 寻求解决此问题的帮助。但是,建议的代码仍会生成与给定目标不一致的输出。

Edit 2: Many thanks to RavinderSingh13 for assistance with solving this question. However, the suggested code still generates output inconsistent with the given objective.

$ cat example3
foo  35025-35144;
bar  35259-35375;
baz  35376-35624;
qux  37911-39434;

$ awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}' example3
new1 1-35024;
foo  35025-35144;
new2 35145-35258;
bar  35259-35375;
new3 35376-35375; # ERROR HERE: Notice that range "new3" has been added, even though ranges "bar" and "baz" are contiguous.
baz  35376-35624;
new4 35625-37910;
qux  37911-39434;


推荐答案

这对于可以重叠的范围没有问题在您的原始示例2中显示 bar 1299-1301; baz 1301-1314; 重叠在 1301

This has no problem with ranges that can overlap as you showed in your original example2 where bar 1299-1301; and baz 1301-1314; overlapped at 1301.

$ cat tst.awk
{ split($2,curr,/[-;]/); currStart=curr[1]; currEnd=curr[2] }
currStart > (prevEnd+1) { print "new"++cnt, prevEnd+1 "-" currStart-1 ";" }
{ print; prevEnd=currEnd }

$ awk -f tst.awk file
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
new3 1302-1313;
baz  1314-5266;

$ awk -f tst.awk example2
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
baz  1301-1314;
new3 1315-1398;
qux  1399-5266;

$ awk -f tst.awk example3
new1 1-35024;
foo  35025-35144;
new2 35145-35258;
bar  35259-35375;
baz  35376-35624;
new3 35625-37910;
qux  37911-39434;

这篇关于通过awk离散到连续数字范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆