超过某个临界值的连续值数 [英] Number of consecutive values above certain cut off

查看:65
本文介绍了超过某个临界值的连续值数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是bash和linux编程的新手.我有一个小问题.

I am new in bash and linux programming. I have a small problem.

对于特定的截止点(c),我想转储一个文件,如果两个连续的值都在c之上,则该文件将打印出c之上的值.例如

For a particular cut-off (c) I want to dump a file which will print out values above c if two consecutive values are above c. For example

x y
1 0.34
2 0.3432
3 0.32
4 0.35
5 0.323
6 0.3623
7 0.345

如果c = 0.33,它将打印出第2列

It will print out column 2 if c=0.33

0.34
0.3432
0.3623
0.345

尽管它超出了截止值0.33,但它不会打印出0.35,因为0.35之后的下一个值为0.323,这使参数两个连续的值都高于c"失败了.

It will not print out 0.35 despite it was above cut off 0.33 because the next value after 0.35 was 0.323 which fails the argument 'two consecutive values are above c'.

推荐答案

原始问题:打印2个或多个连续值满足给定条件的所有序列

以下应该可以工作:

awk 'p || (prev>c && $2>c && NR>2){print prev}
     { p = (prev>c && $2>c); prev=$2 }
     END{if(p) print $2 }' c=0.33 <file>

它具有以下逻辑:

  • p跟踪是否已打印上一行.如果已打印,则也应打印当前行.
  • 如果未打印前一行(p==0),则应检查是否应打印(prev>c && $2>c)
  • 的前一行
  • 计算下一行的p并将prev设置为当前值
  • 最后,如果p==1打印最后一个值.
  • p keeps track if the previous line has been printed. If it is printed then the current line should also be printed.
  • If the previous line is not printed (p==0), then you should check if you should print the previous line if (prev>c && $2>c)
  • Compute p for the next line and set prev to the current value
  • At the end, if p==1 print the last value.

您基本上总是落后一线.

You essentially always run one line behind.

另一种解决方法是检查该值是否满足条件并将其存储在数组中.如果遇到不满足条件的值,请处理该数组.这会占用更多的内存:

Another way to approach this is checking if the value satisfies the condition and store it in an array. If you encounter a value that does not satisfy the condition, process the array. This is a bit more memory intensive :

awk '(NR==1){next}
     ($2>c) { a[NR]=$2; next }
     (length(a) == 1) { delete a[NR-1]; next }
     { for(i=NR-length(a);i<NR;++i) {print a[i]; delete a[i]} }
     END { if (length(a)>1) for(i=NR+1-length(a);i<=NR;++i) {print a[i]} }
    ' c=0.33 <file>

第二个问题: 打印$ 2的连续值的子集,其中m或更多的值满足条件cond,并且最多n个连续值不满足.序列的开始和结束的值满足cond

Second question: print the subset of consecutive values of $2 for which m or more values satisfy condition cond and at most n consecutive values do not satisfy cond. The sequence starts and ends with a value satisfying cond

以下awk脚本将执行此操作.不要忘记根据自己的意愿调整值mnc并更新条件函数.

The following awk script will do this. Don't forget to adjust the values m, n and c to your wishes and update the conditional function.

function cond(val) { return val > c }
BEGIN{c=0.33; m=2; n=1}
# skip the header
(NR==1){next}
# if no values satisfy cond ...
(M==0 && !cond($2)) { next }
# ... otherwise continue from here
{ a[NR]=$2 }
# set counters M and N (M satisfy cond, N not )
 cond($2) { M++; N=0 }
!cond($2) { N++ }
# This sequence failed, delete it
(N>n && M<m) { for(i in a) delete a[i]; M=0; N=0 }
# This sequence is OK, strip it and print it
(N>n) { j=NR; while (!cond(a[j])) delete a[j--]
        for (i=j+1-length(a);i<=j;++i) { print a[i]; delete a[i] }
        M=0; N=0 }
# Check if the final stored sequence is successful
END { if (M>=m) { 
         j=NR; while (!cond(a[j])) delete a[j--]
         for (i=j+1-length(a);i<=j;++i) print a[i]
      }
    }

这篇关于超过某个临界值的连续值数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆