从数据集中读取随机行 [英] Read random rows from the dataset

查看:81
本文介绍了从数据集中读取随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有1000条记录的数据集



这是用于存储随机选择记录的列表。



<前lang =c#> 私人 静态列表< string> [] BChrom = < span class =code-keyword> new 列表< string> [ 10 ];





如何将整个数据集 RANDOMLY 中的20%添加到该字符串列表中



 尝试 
{
使用(sr = new StreamReader( @ C:\ Users \ ** *** \Documents\sub0000.data))
{
for int i = 0 ; i < BChrom.Length; i ++)
{

}
}
}

解决方案

试试这个:

 ArrayList ReadRandom( string  sourceFile, int  sampleSize)
{
ArrayList BChrom = new ArrayList(sampleSize);
Random random = new Random();
FileStream ifs = new FileStream(sourceFile,FileMode.Open);
StreamReader sr = new StreamReader(ifs);
string line = ;

// 确定源文件的范围
long lastPos = sr.BaseStream.Seek( 0 ,SeekOrigin.End);

for int i = 0 ; i < sampleSize; ++ i)
{
/ / 生成随机位置
double pct = random.NextDouble(); // [0.0,1.0}
long randomPos =( long )(pct * lastPos);
if (pct > = 0 . 99
randomPos - = 1024 ; // 如果接近结束,请备份

sr.BaseStream .Seek(randomPos,SeekOrigin.Begin);

line = sr.ReadLine(); // 消耗curr部分行
line = sr.ReadLine(); // 这将是一个完整的行
sr.DiscardBufferedData(); // magic

BChrom.Add(line);
}

sr.Close();
ifs.Close();

return BChrom;
}





有一些缺点(如果文件大小小于1024等,最后一行永远不会被读取)但大型文件的性能有保证......


I have my dataset with 1000 records

This is list used to store the randomly selected records.

private static List<string>[] BChrom = new List<string>[10];



How can I add 20% from the whole dataset RANDOMLY to that List of string

try
            {
                using (sr = new StreamReader(@"C:\Users\*****\Documents\sub0000.data"))
                {
                    for (int i = 0; i < BChrom.Length; i++)
                    {
                        
                    }
                }
            }

解决方案

Try this:

ArrayList ReadRandom(string sourceFile, int sampleSize)
{
    ArrayList BChrom = new ArrayList(sampleSize);
    Random random = new Random();
    FileStream ifs = new FileStream(sourceFile, FileMode.Open);
    StreamReader sr = new StreamReader(ifs);
    string line = "";

    // determine extent of source file
    long lastPos = sr.BaseStream.Seek(0, SeekOrigin.End);

    for (int i = 0; i < sampleSize; ++i)
    {
        // generate a random position
        double pct = random.NextDouble(); // [0.0, 1.0)
        long randomPos = (long)(pct * lastPos);
        if (pct >= 0.99)
            randomPos -= 1024; // if near the end, back up a bit

        sr.BaseStream.Seek(randomPos, SeekOrigin.Begin);

        line = sr.ReadLine(); // consume curr partial line
        line = sr.ReadLine(); // this will be a full line
        sr.DiscardBufferedData(); // magic

        BChrom.Add(line);
    }

    sr.Close(); 
    ifs.Close();

    return BChrom;
}



There are some drawbacks(like last line is never read, if the file size is less than 1024 etc) but performance is guaranteed on large files...


这篇关于从数据集中读取随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆