什么是通过个别字符在C#中的字符串进行迭代的最快方法? [英] What is the fastest way to iterate through individual characters in a string in C#?

查看:131
本文介绍了什么是通过个别字符在C#中的字符串进行迭代的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

标题是问题。下面是我试图通过研究来回答。但我不相信我的不知情的研究,所以我仍然对这个问题(什么是通过单个字符在C#?一个字符串迭代最快的方式)。

The title is the question. Below is my attempt to answer it through research. But I don't trust my uninformed research so I still pose the question (What is the fastest way to iterate through individual characters in a string in C#?).

偶尔我想通过一个串的一个接一个,字符周期这样嵌套令牌解析时为 - 这>无法做到的。我想知道最快的方式是通过在一个字符串的单个字符,尤其是非常大的字符串进行迭代什么。

Occasionally I want to cycle through the characters of a string one-by-one, such as when parsing for nested tokens -- something which cannot be done with regular expressions. I am wondering what the fastest way is to iterate through the individual characters in a string, particularly very large strings.

我做了一堆测试自己和我的结果如下。不过也有不少读者在.NET CLR和C#编译器的深入了解更多,所以我不知道如果我失去了一些东西明显,或者如果我在测试代码犯了一个错误。所以,我征求你的集体响应。如果任何人有洞悉字符串索引实际上是如何工作的,这将是非常有益的。 (它是编译成别的东西在幕后?还是什么内置在CLR一个C#语言的功能?)。

I did a bunch of testing myself and my results are below. However there are many readers with much more in depth knowledge of the .NET CLR and C# compiler so I don't know if I'm missing something obvious, or if I made a mistake in my test code. So I solicit your collective response. If anyone has insight into how the string indexer actually works that would be very helpful. (Is it a C# language feature compiled into something else behind the scenes? Or something built in to the CLR?).

使用流的第一个方法是直接从从线程接受的答案采取:如何从一个字符串产生流?

The first method using a stream was taken directly from the accepted answer from the thread: how to generate a stream from a string?

测试

longString 是由的C#的纯文本版本89份9910万字符的字符串语言规范。所示结果为20次迭代。那里有一个'启动'时间(如在方法#3的隐式创建阵列的第一次迭代),我测试了分别,例如通过从第一迭代后循环破

longString is a 99.1 million character string consisting of 89 copies of the plain-text version of the C# language specification. Results shown are for 20 iterations. Where there is a 'startup' time (such as for the first iteration of the implicitly created array in method #3), I tested that separately, such as by breaking from the loop after the first iteration.

结果

这是我的测试中,缓存中使用ToCharArray字符数组的字符串()方法是最快的用于遍历整个字符串。该ToCharArray()方法是一个前期的费用,以及随后对单个字符访问比内置的指数存取速度稍快。

From my tests, caching the string in a char array using the ToCharArray() method is the fastest for iterating over the entire string. The ToCharArray() method is an upfront expense, and subsequent access to individual characters is slightly faster than the built in index accessor.

                                           milliseconds
                                ---------------------------------
 Method                         Startup  Iteration  Total  StdDev
------------------------------  -------  ---------  -----  ------
 1 index accessor                     0        602    602       3
 2 explicit convert ToCharArray     165        410    582       3
 3 foreach (c in string.ToCharArray)168        455    623       3
 4 StringReader                       0       1150   1150      25
 5 StreamWriter => Stream           405       1940   2345      20
 6 GetBytes() => StreamReader       385       2065   2450      35
 7 GetBytes() => BinaryReader       385       5465   5850      80
 8 foreach (c in string)              0        960    960       4

< STRONG>更新:每@ Eric的评论,这里有过一个比较正常的1.1M的字符字符串(C#规范的一个副本),为100次迭代的结果。索引和字符数组仍然是最快的,其次的foreach(字符串字符),其次是流的方法。

Update: Per @Eric's comment, here are results for 100 iterations over a more normal 1.1 M char string (one copy of the C# spec). Indexer and char arrays are still fastest, followed by foreach(char in string), followed by stream methods.

                                           milliseconds
                                ---------------------------------
 Method                         Startup  Iteration  Total  StdDev
------------------------------  -------  ---------  -----  ------
 1 index accessor                     0        6.6    6.6    0.11
 2 explicit convert ToCharArray     2.4        5.0    7.4    0.30
 3 for(c in string.ToCharArray)     2.4        4.7    7.1    0.33
 4 StringReader                       0       14.0   14.0    1.21
 5 StreamWriter => Stream           5.3       21.8   27.1    0.46
 6 GetBytes() => StreamReader       4.4       23.6   28.0    0.65
 7 GetBytes() => BinaryReader       5.0       61.8   66.8    0.79
 8 foreach (c in string)              0       10.3   10.3    0.11     

代码(另检验;一起显示为简洁起见)

Code Used (tested separately; shown together for brevity)

//1 index accessor
int strLength = longString.Length;
for (int i = 0; i < strLength; i++) { c = longString[i]; }

//2 explicit convert ToCharArray
int strLength = longString.Length;
char[] charArray = longString.ToCharArray();
for (int i = 0; i < strLength; i++) { c = charArray[i]; }

//3 for(c in string.ToCharArray)
foreach (char c in longString.ToCharArray()) { } 

//4 use StringReader
int strLength = longString.Length;
StringReader sr = new StringReader(longString);
for (int i = 0; i < strLength; i++) { c = Convert.ToChar(sr.Read()); }

//5 StreamWriter => StreamReader 
int strLength = longString.Length;
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(longString);
writer.Flush();
stream.Position = 0;
StreamReader str = new StreamReader(stream);
while (stream.Position < strLength) { c = Convert.ToChar(str.Read()); } 

//6 GetBytes() => StreamReader
int strLength = longString.Length;
MemoryStream stream = new MemoryStream(Encoding.Unicode.GetBytes(longString));
StreamReader str = new StreamReader(stream);
while (stream.Position < strLength) { c = Convert.ToChar(str.Read()); }

//7 GetBytes() => BinaryReader 
int strLength = longString.Length;
MemoryStream stream = new MemoryStream(Encoding.Unicode.GetBytes(longString));
BinaryReader br = new BinaryReader(stream, Encoding.Unicode);
while (stream.Position < strLength) { c = br.ReadChar(); }

//8 foreach (c in string)
foreach (char c in longString) { } 

接受的答案:

我解释@CodeInChaos和Ben的注意事项如下:

I interpreted @CodeInChaos and Ben's notes as follows:

fixed (char* pString = longString) {
    char* pChar = pString;
    for (int i = 0; i < strLength; i++) {
        c = *pChar ;
        pChar++;
    }
}

执行了100次迭代在短期字符串为4.4毫秒,以< 0.1毫秒ST开发

Execution for 100 iterations over the short string was 4.4 ms, with < 0.1 ms st dev.

推荐答案

最快的答案是使用C ++ / CLI:的如何在一个系统::字符串访问字符串

The fastest answer is to use C++/CLI: How to: Access Characters in a System::String

这个方法遍历使用指针运算字符串中就地人物。有没有份,没有隐含范围检查,也没有每个元素的函数调用。

This approach iterates through the characters in-place in the string using pointer arithmetic. There are no copies, no implicit range checks, and no per-element function calls.

很可能可以得到(近,C ++ / CLI不需要钉住)从C#一样的性能通过写 PtrToStringChars 中不安全的C#版本。

It's likely possible to get (nearly, C++/CLI doesn't require pinning) the same performance from C# by writing an unsafe C# version of PtrToStringChars.

<击>
的东西这样的:

Something like:

unsafe char* PtrToStringContent(string s, out GCHandle pin)
{
    pin = GCHandle.Alloc(s, GCHandleType.Pinned);
    return (char*)pin.AddrOfPinnedObject().Add(System.Runtime.CompilerServices.RuntimeHelpers.OffsetToStringData).ToPointer();
}



千万记得打电话 GCHandle.Free 之后。

CodeInChaos的评论指出,C#提供了一个语法糖这样的:

CodeInChaos's comment points out that C# provides a syntactic sugar for this:

fixed(char* pch = s) { ... }

这篇关于什么是通过个别字符在C#中的字符串进行迭代的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆