从数据集中读取随机行 [英] Read random rows from the dataset
本文介绍了从数据集中读取随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有1000条记录的数据集
这是用于存储随机选择记录的列表。
<前lang =c#> 私人 静态列表< string> [] BChrom = < span class =code-keyword> new 列表< string> [ 10 ];
如何将整个数据集 RANDOMLY 中的20%添加到该字符串列表中
尝试
{
使用(sr = new StreamReader( @ C:\ Users \ ** *** \Documents\sub0000.data))
{
for ( int i = 0 ; i < BChrom.Length; i ++)
{
}
}
}
解决方案
试试这个:
ArrayList ReadRandom( string sourceFile, int sampleSize)
{
ArrayList BChrom = new ArrayList(sampleSize);
Random random = new Random();
FileStream ifs = new FileStream(sourceFile,FileMode.Open);
StreamReader sr = new StreamReader(ifs);
string line = ;
// 确定源文件的范围
long lastPos = sr.BaseStream.Seek( 0 ,SeekOrigin.End);
for ( int i = 0 ; i < sampleSize; ++ i)
{
/ / 生成随机位置
double pct = random.NextDouble(); // [0.0,1.0}
long randomPos =( long )(pct * lastPos);
if (pct > = 0 . 99 )
randomPos - = 1024 ; // 如果接近结束,请备份
sr.BaseStream .Seek(randomPos,SeekOrigin.Begin);
line = sr.ReadLine(); // 消耗curr部分行
line = sr.ReadLine(); // 这将是一个完整的行
sr.DiscardBufferedData(); // magic
BChrom.Add(line);
}
sr.Close();
ifs.Close();
return BChrom;
}
有一些缺点(如果文件大小小于1024等,最后一行永远不会被读取)但大型文件的性能有保证......
I have my dataset with 1000 records
This is list used to store the randomly selected records.
private static List<string>[] BChrom = new List<string>[10];
How can I add 20% from the whole dataset RANDOMLY to that List of string
try
{
using (sr = new StreamReader(@"C:\Users\*****\Documents\sub0000.data"))
{
for (int i = 0; i < BChrom.Length; i++)
{
}
}
}
解决方案
Try this:
ArrayList ReadRandom(string sourceFile, int sampleSize) { ArrayList BChrom = new ArrayList(sampleSize); Random random = new Random(); FileStream ifs = new FileStream(sourceFile, FileMode.Open); StreamReader sr = new StreamReader(ifs); string line = ""; // determine extent of source file long lastPos = sr.BaseStream.Seek(0, SeekOrigin.End); for (int i = 0; i < sampleSize; ++i) { // generate a random position double pct = random.NextDouble(); // [0.0, 1.0) long randomPos = (long)(pct * lastPos); if (pct >= 0.99) randomPos -= 1024; // if near the end, back up a bit sr.BaseStream.Seek(randomPos, SeekOrigin.Begin); line = sr.ReadLine(); // consume curr partial line line = sr.ReadLine(); // this will be a full line sr.DiscardBufferedData(); // magic BChrom.Add(line); } sr.Close(); ifs.Close(); return BChrom; }
There are some drawbacks(like last line is never read, if the file size is less than 1024 etc) but performance is guaranteed on large files...
这篇关于从数据集中读取随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文