最佳输出字典 [英] outputing dictionary optimally

查看:129
本文介绍了最佳输出字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有4个字典,其中包含800,000个字符串,其中包含200到6000个字符. 当我将其加载到内存中时,它将占用约11个演出空间. 我花了2分钟来解析数据,花了2分钟来输出数据. 无论如何,有没有比我在下面使用的更快的数据输出? 我每秒只能获得20-31 MB的磁盘IO,我知道硬盘驱动器可以完成800个小时

i have 4 dictionarys that contain 800k strings with 200 to 6000 characters. when i load it into memory it takes up about 11 gigs of memory. it is taking me 2 minutes to parse the data and 2 minutes to output the data. is there anyway to output the data faster than what i am using below? I am only getting 20-31 MB per second disk IO and I know the hard drive can do 800ish

var hash1 = new Dictionary<int, Dictionary<string, string>>(f.Count + 2);
var hash2 = new Dictionary<int, Dictionary<string, string>>(f.Count + 2);
var hash3 = new Dictionary<int, Dictionary<string, string>>(f.Count + 2);
var hash4 = new Dictionary<int, Dictionary<string, string>>(f.Count + 2);
....
foreach (var me in mswithfilenames)
{
    filename = me.Key.ToString();
    string filenamef = filename + "index1";
    string filenameq = filename + "index2";
    string filenamefq = filename + "index3";
    string filenameqq = filename + "index4";

    StreamWriter sw = File.AppendText(filenamef);
    StreamWriter sw2 = File.AppendText(filenameq);
    StreamWriter swq = File.AppendText(filenamefq);
    StreamWriter sw2q = File.AppendText(filenameqq);

    for (i = 0; i <= totalinhash; i++)
    {
        if (hashs1[i].ContainsKey(filenamef))
        {
            sw.Write(hashs1[i][filenamef]);
        }
        if (hashs2[i].ContainsKey(filenameq))
        {
            sw2.Write(hashs2[i][filenameq]);
        }
        if (hashs3[i].ContainsKey(filenamefastaq))
        {
            swq.Write(hash4[i][filenamefastaq]);
        }

        if (hash4[i].ContainsKey(filenameqq))
        {
            sw2q.Write(hash4[i][filenameqq]);
        }
    }

    sw.Close();
    sw2.Close();
    sw3.Close();
    sw4.Close();
    swq.Close();
    sw2q.Close();
}

推荐答案

最昂贵的部分是I/O.这个循环:

The most expensive part is the I/O. And this loop:

for (i = 0; i <= totalinhash; i++)
{
    if (hashs1[i].ContainsKey(filenamef))
    {
        sw.Write(hashs1[i][filenamef]);
    }
    if (hashs2[i].ContainsKey(filenameq))
    {
        sw2.Write(hashs2[i][filenameq]);
    }
    ...
}

在不同文件之间交替.这可能会导致一些额外的头部移动,并且会创建碎片化的文件(减慢了对这些文件的以后的操作).

is alternating between different files. That will probably cause some extra head-movement and it creates fragmented files (slowing future actions on those files).

我会使用:

for (i = 0; i <= totalinhash; i++)
{
    if (hashs1[i].ContainsKey(filenamef))
    {
        sw.Write(hashs1[i][filenamef]);
    }
}

for (i = 0; i <= totalinhash; i++)
{
    if (hashs2[i].ContainsKey(filenameq))
    {
        sw2.Write(hashs2[i][filenameq]);
    }
}
...

但是,您当然应该对此进行衡量.例如,它对SSD不会有太大影响,而仅对机械磁盘而言.

But of course you should measure this. It won't make much difference on SSDs for instance, only on mechanical disks.

这篇关于最佳输出字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆