优化大型字典以在.NET中阅读 [英] Optimize large dictionary for reading in .NET

查看:64
本文介绍了优化大型字典以在.NET中阅读的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个与数字字符串有关的夫妇的清单,其中有22k. (这是MAC地址供应商列表).

I have a list of number-string related couples, there are 22k of them. (it is a MAC address vendor list).

在我的代码中,我正在按MAC地址的前三个字节搜索供应商名称.

In my code, I'm searching the Vendor name by first three bytes of MAC address.

我知道,我可以使用字典,甚至可以使用数组,但是每次运行程序时都需要初始化字典,但是程序仅使用少量翻译(占项目的百分之一)在字典中),并且在程序运行时对字典进行初始化会花费大量时间.

I know, I can use a dictionary, even array use is possible, but there is the need to initialize the dictionary every time I run the program, but the program uses only small amount of translations (under one percent to the items in dictionary) and initialisation of the dictionary takes a significant amount of time when the program runs.

您能想象其他方法吗?在旧的VB6中,可以读取二进制文件并查找记录,这对我来说已经足够了,因为我将仅加载我实际需要的值.

Can you imagine any other method? In old VB6 there was possibility to read Binary file and seek the records, which would be good enough for me, because I will load only values that I actually need.

我更喜欢项目中的解决方案-因此,没有包含数据的外部文件.我正在尝试使用以下代码:-

I'm prefering an in-project solution - so there is no external file with data. I am trying to use code like:-

Vendors.add("00125A","Microsoft Corporation") 
'... this in another 22000 times '
Vendors.add("00124E","XAC AUTOMATION CORP.")

推荐答案

不知道最好的行动方案是什么,或者这是否对您有帮助..

Not sure what the best course of action should be for you or if this will actually help you but..

您似乎正在寻找一种方法来从结构化文件中查找并读取特定记录.

You seem to seek a way to seek&read a certain record from a structured file.

为此,您可以定义一个类,该类封装记录字段以及访问方法.

For this you can define a class the encapsulates the record fields and also the access methods.

这里是一个例子.在我的机器上,它会创建,存储22k +条记录并在20ms左右的时间内找到一些记录. Otoh进行100次随机搜索需要3.5秒,这显然是因为它总是从开始就开始.再次进行顺序搜索相当快.

Here is an example. On my machine it creates, stores 22k+ records and seeks a few all in around 20ms. Otoh doing 100 random seeks takes 3.5 seconds, obviously because it always start at the begiining. Doing a sequential search is rather fast again..

当然,总时间取决于您的计算机以及要查找和读取的记录数.

Of course total time will depend on your machine and how many records you will seek&read..

这里是一个记录类,其中包含一个字节,一个长整数和一个字符串:

Here is a record class that holds a byte, a long and a string:

class aRecord
{
    byte aByte { get; set; }
    long aLong { get; set; }
    string aString { get; set; }

    public aRecord() { }

    public aRecord(byte b_, long l_, string s_)
    { aByte = b_; aLong = l_; aString = s_; }

    public void writeToStream(BinaryWriter bw )
    {
        bw.Write(aByte);
        bw.Write(aLong);
        bw.Write(aString);
    }

    public void readFromStream(BinaryReader br)
    {
        aByte = br.ReadByte();
        aLong = br.ReadInt64();
        aString = br.ReadString();
    }

    static public aRecord readFromStream(BinaryReader br, int record)
    {
        int r = 0;
        aRecord  rec = new aRecord();
        br.BaseStream.Position = 0;
        while (br.PeekChar() != -1 & r <= record  )
        {
            rec.readFromStream(br);
            r++;
        }
        return rec;
    }

    static public aRecord readFromStream(BinaryReader br, string search)
    {
        aRecord rec = new aRecord();
        while (br.PeekChar() != -1 )
        {
            rec.readFromStream(br);
            if (rec.aString.Contains(search)) return rec;
        }
        return null;
    }

}

我这样测试:

Console.WriteLine(DateTime.Now.ToString("ss,ffff") + "  init ");

List<aRecord> data = new List<aRecord>();

Random rnd = new Random(9);

int count = 23000;
for (int i = 1000; i < count; i++ )
{
    data.Add(new aRecord((byte)(i%128), i, "X" + rnd.Next(13456).ToString()));
}

Console.WriteLine(DateTime.Now.ToString("ss,ffff") + "  write ");

string fileName = "D:\\_DataStream.dat";

FileStream sw = new FileStream(fileName, FileMode.Create);
BinaryWriter bw = new BinaryWriter(sw);

foreach(aRecord r in data)
{
    r.writeToStream(bw);

}
bw.Flush();
sw.Close();
bw.Close();

FileStream sr = new FileStream(fileName, FileMode.Open);
BinaryReader br = new BinaryReader(sr);

List<aRecord> data2 = new List<aRecord>();
Console.WriteLine(DateTime.Now.ToString("ss,ffff") + "  begin search");
for (int i = 0; i < 100; i++)
{
    aRecord  rec = aRecord.readFromStream(br, "911");
    if (rec != null) data2.Add(rec);
}
Console.WriteLine(DateTime.Now.ToString("ss,ffff") + "  done. found " + data2.Count);


Console.WriteLine(DateTime.Now.ToString("ss,ffff") + "  seek ");

aRecord ar = aRecord.readFromStream(br, 0);
Console.WriteLine(DateTime.Now.ToString("ss,ffff") + " 0 ");

aRecord ar1 = aRecord.readFromStream(br, 1);
Console.WriteLine(DateTime.Now.ToString("ss,ffff") + " 1 ");

aRecord ar2 = aRecord.readFromStream(br, 13000);
Console.WriteLine(DateTime.Now.ToString("ss,ffff") + " 13000 ");

aRecord ar3 = aRecord.readFromStream(br, 23000-1);
Console.WriteLine(DateTime.Now.ToString("ss,ffff") + " 23000 end ");

br.Close();
sr.Close();

您的头衔与优化Dictionary 有关.这取决于主要用途是:阅读还是写作?如果您在词典中阅读很多,最好创建一个SortedDictionary.如果您需要创建许多mor条目,而不是正常的Dictionary更好..

Your titel is concerned with optimizing a Dictionary. This depends on what the uses will mainly be: Reading or Writing? If you read a lot in the Dictionary, best create a SortedDictionary. If you need to create many mor entries than you expect to read a nomal Dictionary would be better..

..还有很多收集类,但是首先要找出真正的瓶颈.上面的查找和读取例程不会浪费时间将数据插入到Dictionary中,而只是丢弃它们,直到找到正确的记录为止.我还添加了一种搜索方法,该方法在每次命中相同位置后都会继续.扩展类以满足自己的需求非常简单.

..and there are evne more collection classes, but the first thing is to find out what the true bottleneck is. The above seek&read routine will not waste time inserting the data into a Dictionary but simply discard them until the right record is found. I have also added a search method which continues after each hit at the same position. Expanding the class to suit your own needs is rather simple.

27,2208初始

27,2297写

27,2297 write

27,2438寻找

27,2438开始搜索

27,2438 begin search

完成27,3097.找到38

27,3097 done. found 38

27,3097 0结尾

27,3097 0 end

27,3097 1结束

27,3097 1 end

27,3457 13000结束

27,3457 13000 end

27,4037 23000结束

27,4037 23000 end

这篇关于优化大型字典以在.NET中阅读的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆