超过某个临界值的连续值数 [英] Number of consecutive values above certain cut off
问题描述
我是bash和linux编程的新手.我有一个小问题.
I am new in bash and linux programming. I have a small problem.
对于特定的截止点(c),我想转储一个文件,如果两个连续的值都在c之上,则该文件将打印出c之上的值.例如
For a particular cut-off (c) I want to dump a file which will print out values above c if two consecutive values are above c. For example
x y
1 0.34
2 0.3432
3 0.32
4 0.35
5 0.323
6 0.3623
7 0.345
如果c = 0.33,它将打印出第2列
It will print out column 2 if c=0.33
0.34
0.3432
0.3623
0.345
尽管它超出了截止值0.33,但它不会打印出0.35,因为0.35之后的下一个值为0.323,这使参数两个连续的值都高于c"失败了.
It will not print out 0.35 despite it was above cut off 0.33 because the next value after 0.35 was 0.323 which fails the argument 'two consecutive values are above c'.
推荐答案
原始问题:打印2个或多个连续值满足给定条件的所有序列
以下应该可以工作:
awk 'p || (prev>c && $2>c && NR>2){print prev}
{ p = (prev>c && $2>c); prev=$2 }
END{if(p) print $2 }' c=0.33 <file>
它具有以下逻辑:
-
p
跟踪是否已打印上一行.如果已打印,则也应打印当前行. - 如果未打印前一行(
p==0
),则应检查是否应打印(prev>c && $2>c)
的前一行
- 计算下一行的
p
并将prev
设置为当前值 - 最后,如果
p==1
打印最后一个值.
p
keeps track if the previous line has been printed. If it is printed then the current line should also be printed.- If the previous line is not printed (
p==0
), then you should check if you should print the previous line if(prev>c && $2>c)
- Compute
p
for the next line and setprev
to the current value - At the end, if
p==1
print the last value.
您基本上总是落后一线.
You essentially always run one line behind.
另一种解决方法是检查该值是否满足条件并将其存储在数组中.如果遇到不满足条件的值,请处理该数组.这会占用更多的内存:
Another way to approach this is checking if the value satisfies the condition and store it in an array. If you encounter a value that does not satisfy the condition, process the array. This is a bit more memory intensive :
awk '(NR==1){next}
($2>c) { a[NR]=$2; next }
(length(a) == 1) { delete a[NR-1]; next }
{ for(i=NR-length(a);i<NR;++i) {print a[i]; delete a[i]} }
END { if (length(a)>1) for(i=NR+1-length(a);i<=NR;++i) {print a[i]} }
' c=0.33 <file>
第二个问题: 打印$ 2的连续值的子集,其中m
或更多的值满足条件cond
,并且最多n
个连续值不满足cond
Second question: print the subset of consecutive values of $2 for which m
or more values satisfy condition cond
and at most n
consecutive values do not satisfy cond
. The sequence starts and ends with a value satisfying cond
以下awk
脚本将执行此操作.不要忘记根据自己的意愿调整值m
,n
和c
并更新条件函数.
The following awk
script will do this. Don't forget to adjust the values m
, n
and c
to your wishes and update the conditional function.
function cond(val) { return val > c }
BEGIN{c=0.33; m=2; n=1}
# skip the header
(NR==1){next}
# if no values satisfy cond ...
(M==0 && !cond($2)) { next }
# ... otherwise continue from here
{ a[NR]=$2 }
# set counters M and N (M satisfy cond, N not )
cond($2) { M++; N=0 }
!cond($2) { N++ }
# This sequence failed, delete it
(N>n && M<m) { for(i in a) delete a[i]; M=0; N=0 }
# This sequence is OK, strip it and print it
(N>n) { j=NR; while (!cond(a[j])) delete a[j--]
for (i=j+1-length(a);i<=j;++i) { print a[i]; delete a[i] }
M=0; N=0 }
# Check if the final stored sequence is successful
END { if (M>=m) {
j=NR; while (!cond(a[j])) delete a[j--]
for (i=j+1-length(a);i<=j;++i) print a[i]
}
}
这篇关于超过某个临界值的连续值数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!