随机挑选线条从文件没有咕嘟咕嘟它的Unix [英] Randomly Pick Lines From a File Without Slurping It With Unix
本文介绍了随机挑选线条从文件没有咕嘟咕嘟它的Unix的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个10 ^ 7行的文件,在其中我想选择行随机1/100
从该文件。这是AWK code我有,但它吸食的所有文件内容
前手。我的PC内存不能处理这样的吸食。有没有其他的方法来做到这一点?
的awk'BEGIN {srand()函数}
!/ ^ $ / {A [C ++] = $ 0}
结束 {
对于(i = 1; I< = C;我++){
NUM = INT(RAND()* C)
如果(A [NUM]){
打印[NUM]
删除[NUM]
ð++
}
如果(D == C / 100)破
}
}'文件
解决方案
如果你有很多行,你确定你想要的究竟 1%或统计估计会够吗?
在第二种情况下,就在1%随机在每行...
的awk'BEGIN {srand()函数} / ^ $ / {如果(RAND()< = 0.01)$打印0}!
如果您想在标题行后,加线的随机抽样,使用方法:
的awk'BEGIN {srand()函数} / ^ $ / {如果(RAND()< = 0.01 || FNR == 1)打印$ 0}!
I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC memory cannot handle such slurps. Is there other approach to do it?
awk 'BEGIN{srand()}
!/^$/{ a[c++]=$0}
END {
for ( i=1;i<=c ;i++ ) {
num=int(rand() * c)
if ( a[num] ) {
print a[num]
delete a[num]
d++
}
if ( d == c/100 ) break
}
}' file
解决方案
if you have that many lines, are you sure you want exactly 1% or a statistical estimate would be enough?
In that second case, just randomize at 1% at each line...
awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}'
If you'd like the header line plus a random sample of lines after, use:
awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01 || FNR==1) print $0}'
这篇关于随机挑选线条从文件没有咕嘟咕嘟它的Unix的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文