如何快速的String.Substring相对于字符串处理的其他方法? [英] How fast is String.Substring relative to other methods of string processing?

查看:216
本文介绍了如何快速的String.Substring相对于字符串处理的其他方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用VB.NET来处理一个长定长记录。最简单的方法似乎是加载整个记录成一个字符串,并使用由子串的位置和长度访问的字段。但是好像会有这种情况发生在每一个调用的子字符串方法中的一些冗余处理。这使我怀疑我是否会使用一个或于流基于阵列的方法得到更好的结果。

I'm using VB.NET to process a long fixed-length record. The simplest option seems to be loading the whole record into a string and using Substring to access the fields by position and length. But it seems like there will be some redundant processing within the Substring method that happens on every single invocation. That led me to wonder whether I might get better results using a stream- or array-based approach.

内容开始作为包含UTF8字符数据的字节数组。一对夫妇的其他办法,我想在下面列出的。

The content starts out as a byte array containing UTF8 character data. A couple of other approaches I've thought of are listed below.

  1. 加载串入一个StringReader和阅读它的块的时间
  2. 字节数组转换为字符数组,数组内的位置上访问字符
  3. (这一次似乎愚蠢的,但我会扔出来有)复制字节数组到内存流和使用的StreamReader

这是绝对的premature优化;子串的方法可能是完全可以接受的,即使是几毫秒慢。但是,我认为我会问编码之前,只是为了看看是否有人能想到的理由使用的其他方法之一。

This is definitely premature optimization; the substring approach may be perfectly acceptable even if it's a few milliseconds slower. But I thought I'd ask before coding it, just to see if anyone could think of a reason to use one of the other approaches.

推荐答案

与子主要的成本是子字符串成摘除一个新的字符串。使用反射,你可以看到这一点:

The primary cost with substring is the excising of the sub string into a new string. Using Reflector you can see this:

private unsafe string InternalSubString(int startIndex, int length, bool fAlwaysCopy)
{
    if (((startIndex == 0) && (length == this.Length)) && !fAlwaysCopy)
    {
        return this;
    }
    string str = FastAllocateString(length);
    fixed (char* chRef = &str.m_firstChar)
    {
        fixed (char* chRef2 = &this.m_firstChar)
        {
            wstrcpy(chRef, chRef2 + startIndex, length);
        }
    }
    return str;
}

现在到那里(请注意,这不是子串()),它要经过5检查的长度与这样的。

Now to get there (notice that that is not Substring()) it has to go through 5 checks on length and such.

如果您要引用相同的子多次那么它可能是值得拉所有的东西一旦和倾倒的巨大的字符串。您将承担开销来存储所有这些子字符串数组。

If you are referencing the same substring multiple times then it may well be worth pulling everything out once and dumping the giant string. You will incur overhead in the arrays to store all these substrings.

如果它通常是一个一次性的访问,则子字符串,否则考虑分区了。也许 System.Data.DataTable 将使用?如果你正在做多个访问和解析为其他数据类型则数据表在我看来更有吸引力。如果你只需要在内存中的一个记录一次,然后一个词典<字符串,对象> 应足以容纳一个记录(字段名称必须是唯一的)

If it's generally a "one off" access then Substring it, otherwise consider partitioning up. Perhaps System.Data.DataTable would be of use? If you're doing multiple accesses and parsing to other data types then DataTable looks more attractive to me. If you only need one record in memory at a time then a Dictionary<string,object> should be sufficient to hold one record (field names have to be unique).

另外,你可以写一个自定义的,泛型类处理固定长度记录读数为您服务。指示每个字段和字段的类型的起始索引。字段的长度由下一场的开始(例外是,可以从总的记录长度被推断的最后场)的推断。该类型可以被自动转换使用 int.Parse的喜欢() double.Parse() bool.Parse()等。

Alternatively, you could write a custom, generic class that handles fixed-length record reading for you. Indicate the start index of each field and the type of the field. The length of the field is inferred by the start of the next field (exception is the last field which can be inferred from the total record length). The types can be auto-converted using the likes of int.Parse(), double.Parse(), bool.Parse(), etc.

RecordParser r = new RecordParser();
r.AddField("Name", 0, typeof(string));
r.AddField("Age", 48, typeof(int));
r.AddField("SystemId", 58, typeof(Guid));
r.RecordLength(80);

Dictionary<string, object> data = r.Parse(recordString);

如果反射适合你的想象:

If reflection suits your fancy:

[RecordLength(80)]
public class MyRecord
{
    [RecordFieldOffset(0)]
    string Name;

    [RecordFieldOffset(48)]
    int Age;

    [RecordFieldOffset(58)]
    Guid Systemid;
}

通过属性只需运行在那里你可以得到 PropertyInfo.PropertyType 知道如何处理从记录的子串;你可以拉出来的偏移量和从属性的总长度;和填充返回数据的类的实例。从本质上讲,你可以使用反射来拉出来的信息从我的previous建议调用RecordParser.AddField()和RECORDLENGTH()。

Simply run through the properties where you can get the PropertyInfo.PropertyType to know how to deal with the sub string from the record; you can pull out the offsets and total length from the attributes; and return an instance of your class with the data populated. Essentially, you could use reflection to pull out information to call RecordParser.AddField() and RecordLength() from my previous suggestion.

然后把它包都成一个整洁的小,没有大惊小怪类:

Then wrap it all up into a neat little, no-fuss class:

RecordParser<MyRecord> r = new RecordParser<MyRecord>();
MyRecord data = r.Parse(recordString);

甚至能走这么远叫 r.EnumerateFile(路径\到\文件)并使用收益率回报枚举语法解析出记录

Could even go so far to call r.EnumerateFile("path\to\file") and use the yield return enumeration syntax to parse out records

RecordParser<MyRecord> r = new RecordParser<MyRecord>();
foreach (MyRecord data in r.EnumerateFile("foo.dat"))
{
    // Do stuff with record
}

这篇关于如何快速的String.Substring相对于字符串处理的其他方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆