最佳输出字典 [英] outputing dictionary optimally
问题描述
我有4个字典,其中包含800,000个字符串,其中包含200到6000个字符. 当我将其加载到内存中时,它将占用约11个演出空间. 我花了2分钟来解析数据,花了2分钟来输出数据. 无论如何,有没有比我在下面使用的更快的数据输出? 我每秒只能获得20-31 MB的磁盘IO,我知道硬盘驱动器可以完成800个小时
i have 4 dictionarys that contain 800k strings with 200 to 6000 characters. when i load it into memory it takes up about 11 gigs of memory. it is taking me 2 minutes to parse the data and 2 minutes to output the data. is there anyway to output the data faster than what i am using below? I am only getting 20-31 MB per second disk IO and I know the hard drive can do 800ish
var hash1 = new Dictionary<int, Dictionary<string, string>>(f.Count + 2);
var hash2 = new Dictionary<int, Dictionary<string, string>>(f.Count + 2);
var hash3 = new Dictionary<int, Dictionary<string, string>>(f.Count + 2);
var hash4 = new Dictionary<int, Dictionary<string, string>>(f.Count + 2);
....
foreach (var me in mswithfilenames)
{
filename = me.Key.ToString();
string filenamef = filename + "index1";
string filenameq = filename + "index2";
string filenamefq = filename + "index3";
string filenameqq = filename + "index4";
StreamWriter sw = File.AppendText(filenamef);
StreamWriter sw2 = File.AppendText(filenameq);
StreamWriter swq = File.AppendText(filenamefq);
StreamWriter sw2q = File.AppendText(filenameqq);
for (i = 0; i <= totalinhash; i++)
{
if (hashs1[i].ContainsKey(filenamef))
{
sw.Write(hashs1[i][filenamef]);
}
if (hashs2[i].ContainsKey(filenameq))
{
sw2.Write(hashs2[i][filenameq]);
}
if (hashs3[i].ContainsKey(filenamefastaq))
{
swq.Write(hash4[i][filenamefastaq]);
}
if (hash4[i].ContainsKey(filenameqq))
{
sw2q.Write(hash4[i][filenameqq]);
}
}
sw.Close();
sw2.Close();
sw3.Close();
sw4.Close();
swq.Close();
sw2q.Close();
}
推荐答案
最昂贵的部分是I/O.这个循环:
The most expensive part is the I/O. And this loop:
for (i = 0; i <= totalinhash; i++)
{
if (hashs1[i].ContainsKey(filenamef))
{
sw.Write(hashs1[i][filenamef]);
}
if (hashs2[i].ContainsKey(filenameq))
{
sw2.Write(hashs2[i][filenameq]);
}
...
}
在不同文件之间交替.这可能会导致一些额外的头部移动,并且会创建碎片化的文件(减慢了对这些文件的以后的操作).
is alternating between different files. That will probably cause some extra head-movement and it creates fragmented files (slowing future actions on those files).
我会使用:
for (i = 0; i <= totalinhash; i++)
{
if (hashs1[i].ContainsKey(filenamef))
{
sw.Write(hashs1[i][filenamef]);
}
}
for (i = 0; i <= totalinhash; i++)
{
if (hashs2[i].ContainsKey(filenameq))
{
sw2.Write(hashs2[i][filenameq]);
}
}
...
但是,您当然应该对此进行衡量.例如,它对SSD不会有太大影响,而仅对机械磁盘而言.
But of course you should measure this. It won't make much difference on SSDs for instance, only on mechanical disks.
这篇关于最佳输出字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!