随机挑选线条从文件没有咕嘟咕嘟它的Unix [英] Randomly Pick Lines From a File Without Slurping It With Unix

查看:135
本文介绍了随机挑选线条从文件没有咕嘟咕嘟它的Unix的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个10 ^ 7行的文件,在其中我想选择行随机1/100
从该文件。这是AWK code我有,但它吸食的所有文件内容
前手。我的PC内存不能处理这样的吸食。有没有其他的方法来做到这一点?

 的awk'BEGIN {srand()函数}
!/ ^ $ / {A [C ++] = $ 0}
结束 {
  对于(i = 1; I< = C;我++){
    NUM = INT(RAND()* C)
    如果(A [NUM]){
        打印[NUM]
        删除[NUM]
        ð++
    }
    如果(D == C / 100)破
  }
 }'文件


解决方案

如果你有很多行,你确定你想要的究竟 1%或统计估计会够吗?

在第二种情况下,就在1%随机在每行...

 的awk'BEGIN {srand()函数} / ^ $ / {如果(RAND()< = 0.01)$打印0}!

如果您想在标题行后,加线的随机抽样,使用方法:

 的awk'BEGIN {srand()函数} / ^ $ / {如果(RAND()< = 0.01 || FNR == 1)打印$ 0}!

I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC memory cannot handle such slurps. Is there other approach to do it?

awk 'BEGIN{srand()}
!/^$/{ a[c++]=$0}
END {  
  for ( i=1;i<=c ;i++ )  { 
    num=int(rand() * c)
    if ( a[num] ) {
        print a[num]
        delete a[num]
        d++
    }
    if ( d == c/100 ) break
  }
 }' file

解决方案

if you have that many lines, are you sure you want exactly 1% or a statistical estimate would be enough?

In that second case, just randomize at 1% at each line...

awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}'

If you'd like the header line plus a random sample of lines after, use:

awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01 || FNR==1) print $0}'

这篇关于随机挑选线条从文件没有咕嘟咕嘟它的Unix的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆