如何使用Perl从文件中准确获取n条随机行? [英] How can I get exactly n random lines from a file with Perl?

查看：162 发布时间：2020/6/14 19:28:40 perl random-sample file-processing

本文介绍了如何使用Perl从文件中准确获取n条随机行?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

紧跟

Following up on this question, I need to get exactly n lines at random out of a file (or stdin). This would be similar to head or tail, except I want some from the middle.

现在，除了用链接的问题的解决方案遍历文件外，一次运行准确获得n行的最佳方法是什么?

Now, other than looping over the file with the solutions to the linked question, what's the best way to get exactly n lines in one run?

作为参考，我尝试过:

#!/usr/bin/perl -w
use strict;
my $ratio = shift;
print $ratio, "\n";
while () {
    print if ((int rand $ratio) == 1); 
}

其中$ratio是我想要的行的粗略百分比.例如，如果我想要每10行中就有1行:

where $ratio is the rough percentage of lines I want. For instance, if I want 1 in 10 lines:

random_select 10 a.list

但是，这不能给我确切的金额:

However, this doesn't give me an exact amount:

aaa> foreach i ( 0 1 2 3 4 5 6 7 8 9 )
foreach? random_select 10 a.list | wc -l
foreach? end
4739
4865
4739
4889
4934
4809
4712
4842
4814
4817

我想到的另一个想法是对输入文件进行采样，然后从数组中随机选择n，但是如果我有一个很大的文件，那就是个问题.

The other thought I had was slurping the input file and then choosing n at random from the array, but that's a problem if I have a really big file.

有什么想法吗?

Here's a nice one-pass algorithm that I just came up with, having O(N) time complexity and O(M) space complexity, for reading M lines from an N-line file.

假设M< = N.

Assume M <= N.

让S为所选行的集合.初始化S到文件的前M行.如果最终结果的顺序很重要，请立即洗洗S.
在下一行l中读取.到目前为止，我们已经阅读了n = M + 1条总行.因此，我们希望选择l作为最后一行之一的可能性为M/n.
以概率M/n接受l;使用RNG决定接受还是拒绝l.
如果已接受l，则随机选择S中的一行，并将其替换为l.
重复步骤2-4，直到文件用完为止，每次读取新行时，递增n.
返回所选行的集合S.

Let S be the set of chosen lines. Initialize S to the first M lines of the file. If the ordering of the final result is important, shuffle S now.
Read in the next line l. So far, we have read n = M + 1 total lines. The probability that we want to choose l as one of our final lines is therefore M/n.
Accept l with probability M/n; use a RNG to decide whether to accept or reject l.
If l has been accepted, randomly choose one of the lines in S and replace it with l.
Repeat steps 2-4 until the file has been exhausted of lines, incrementing n with each new line read.
Return the set S of chosen lines.

这篇关于如何使用Perl从文件中准确获取n条随机行?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Perl从文件中准确获取n条随机行? [英] How can I get exactly n random lines from a file with Perl?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Perl从文件中准确获取n条随机行? [英] How can I get exactly n random lines from a file with Perl?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭