快速字符串操作 [英] Fast string operations

查看:74
本文介绍了快速字符串操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在对我的应用程序进行性能测试,我注意到那里有很多b $ b(我的意思是很多 - 兆字节和兆字节的''em)系统.String

正在创建的实例。


我已经做了一些分析而且我被引导相信(但还不能定量地

确定事实)两个基本的罪魁祸首是很多调用:


1.)if(someString.ToLower()==" somestring" ;)





2.)if(someString!= null&& someString.Trim()。Length> 0 )

ToLower()和Trim()一样生成一个新的字符串实例。


我相信这些被多次调用并且生成一堆<比GC更快收集它们的
字符串,或者可能还有一些奇怪的实习/缓存事件正在进行中。无论如何,字符串

实例的数量会增长和增长。它偶尔会被撞倒,但它基本上是5步前进,1后退。


作为参考,这是一个ASP应用程序。 NET ComVisible

对象。所以我假设这使用了工作站GC,对吧?

无论如何,所以我认为我可以用String.Compare()来解决问题(1)

可以执行 - 不区分大小写的比较而不生成新的

字符串实例。


然而,问题(2)更复杂。似乎没有一个

TrimmedLength或任何类型的方法或属性可以给我一个字符串的长度

,减去空白并且不产生新的字符串实例,
BCL中的



我想我可以做一些不安全的,甚至是非托管代码(这就是MSFT

为System.String中的所有字符串处理和使用

COMString的东西做了,但我想尽量避免这种情况,或者至少使用

库已经写完并经过充分测试。


有什么想法吗?


提前致谢,

Chad Myers

I''ve been perf testing an application of mine and I''ve noticed that there
are a lot (and I mean A LOT -- megabytes and megabytes of ''em) System.String
instances being created.

I''ve done some analysis and I''m led to believe (but can''t yet quantitatively
establish as fact) that the two basic culprits are a lot of calls to:

1.) if( someString.ToLower() == "somestring" )

and

2.) if( someString != null && someString.Trim().Length > 0 )
ToLower() generates a new string instance as does Trim().

I believe that these are getting called many times and churning up a bunch
of strings faster than the GC can collect them, or perhaps there''s some
weird interning/caching thing going on. Regardless, the number of string
instances grows and grows. It gets bumped down occasionally, but it''s
basically 5 steps forward, 1 back.

For reference, this is an ASP application calling into .NET ComVisible
objects. So I assume this uses the workstation GC, right?
Anyhow, so I think that I can solve problem (1) with String.Compare() which
can perform in-place case-insensitive comparisons without generating new
string instances.

Problem (2), however, is more complicated. There doesn''t appear to be a
TrimmedLength or any type of method or property that can give me the length
of a string, minus whitespace and without generating a new string instance,
in the BCL.

I suppose I could do some unsafe, or even unmanaged code (which is what MSFT
did for all their string handling stuff inside System.String and using the
COMString stuff), but I''d like to try to avoid that, or at least use a
library that''s already written and well tested.

Any thoughts?

Thanks in advance,
Chad Myers

推荐答案

乍得,


对于第一个场景,你的解决方案应该给你增加。


对于第二种情况,你应该使用一次反射来获得

对内部静态字符数组WhitespaceChars的引用

字符串类。然后,你可以编写一个方法,它将循环传递给它的

字符串,如下所示:

public static bool TrimIsNullOrEmpty(string value)

{

//如果为null,则退出。

if(value == null)

{

//返回true。

返回true;

}


//循环显示字符字符串。如果在空白数组中找不到该字符

//返回false,否则,完成后返回

true。

foreach(char c in value)

{

//如果在WhitespaceArray中找不到该字符,则返回

// false。

if(Array.IndexOf< char>(WhitespaceArray,char)== -1)

{

/ /返回false。

返回false;

}

}


//返回true,字符串中有空格。

返回true;

}


我使用了IndexOf方法的泛型版本数组类在

中为了消除拳击。另外,如果你真的想从这里挤出每一个最后一点性能,你可以拿WhitespaceArray并使用

将字符作为字典中的键。空白字符的数量

是25(现在,就是这样)。但是,如果你的字符串通常用空格填充

,那么你最初可以通过复制数组

来获得大幅提升,然后将空格字符放在第一位

数组中的元素(这会导致对IndexOf的大多数调用快速返回
,可能比在字典中查找更快)。


我很好奇,你是否看到了性能问题,或者你只是看看这些数字并担心它们吗?随着时间的推移,ASP.NET应用程序倾向于随着GC进入一个很好的沟槽。


希望这会有所帮助。


-

- Nicholas Paldino [.NET / C#MVP]

- mv*@spam.guard.caspershouse.com


" Chad Myers" <厘米**** @ N0.SP4M.austin.rr.com>在消息中写道

新闻:2r ****************** @ tornado.texas.rr.com ...
Chad,

For the first scenario, your solution should give you an increase.

For the second scenario, you should use reflection once to get a
reference to the internal static character array WhitespaceChars on the
string class. Then, you can write a method which will cycle through a
string passed to it, like so:

public static bool TrimIsNullOrEmpty(string value)
{
// If null, then get out.
if (value == null)
{
// Return true.
return true;
}

// Cycle through the characters in the string. If the character is not
found
// in the whitespace array, return false, otherwise, when done, return
true.
foreach (char c in value)
{
// If the character is not found in the WhitespaceArray, then return
// false.
if (Array.IndexOf<char>(WhitespaceArray, char) == -1)
{
// Return false.
return false;
}
}

// Return true, the string is full of whitespace.
return true;
}

I used the generic version of the IndexOf method on the Array class in
order to eliminate boxing. Also, if you really want to squeeze out every
last bit of performance from this, you can take the WhitespaceArray and use
the characters as keys in a dictionary. The number of whitespace characters
is 25 (right now, that is). However, if your strings typically are padded
with spaces, then you could get a big speed boost by copying the array
initially, and then placing the space character as the first element in the
array (which would cause most of the calls to IndexOf to return very
quickly, probably quicker than a lookup in a dictionary).

I am curious though, are you seeing a performance issue, or do you just
see the numbers and are worried about them? ASP.NET applications tend to
get in a nice groove with the GC over time.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Chad Myers" <cm****@N0.SP4M.austin.rr.com> wrote in message
news:2r******************@tornado.texas.rr.com...
我已经对我的应用程序进行了性能测试,我注意到那里有很多(我的意思是很多 - 兆字节和兆字节的''他们)
System.String正在创建的实例。

我已经做了一些分析,我被引导相信(但不能定量地确定为事实)两个基本的罪魁祸首是很多
来电:

1.)if(someString.ToLower()==" somestring")


2.)if(someString!= null&& someString.Trim()。Length> 0)

ToLower()和Trim()一样生成一个新的字符串实例。

我相信这些被多次调用并且比GC可以收集它们更快地生成一串字符串,或者可能有一些奇怪的实习/缓存事情正在进行中上。无论如何,字符串
实例的数量增长和增长。它偶尔会被打倒,但它基本上是前进了5步,后退了1步。

作为参考,这是一个调用.NET ComVisible
对象的ASP应用程序。所以我假设这使用工作站GC,对吧?

无论如何,所以我认为我可以用String.Compare()
解决问题(1),它可以执行就地案例 - 不产生新的字符串实例的不敏感比较。

然而,问题(2)更复杂。似乎没有一个
TrimmedLength或任何类型的方法或属性可以给我一个字符串的长度,减去空格并且不生成新的字符串
实例, BCL。

我想我可以做一些不安全的,甚至是非托管的代码(这是MSFT为System.String中的所有字符串处理工作所做的事情和
使用COMString的东西),但我想尽量避免这种情况,或者至少使用一个已经编写并经过充分测试的库。

有什么想法吗?
提前致谢,
Chad Myers
I''ve been perf testing an application of mine and I''ve noticed that there
are a lot (and I mean A LOT -- megabytes and megabytes of ''em)
System.String instances being created.

I''ve done some analysis and I''m led to believe (but can''t yet
quantitatively establish as fact) that the two basic culprits are a lot of
calls to:

1.) if( someString.ToLower() == "somestring" )

and

2.) if( someString != null && someString.Trim().Length > 0 )
ToLower() generates a new string instance as does Trim().

I believe that these are getting called many times and churning up a bunch
of strings faster than the GC can collect them, or perhaps there''s some
weird interning/caching thing going on. Regardless, the number of string
instances grows and grows. It gets bumped down occasionally, but it''s
basically 5 steps forward, 1 back.

For reference, this is an ASP application calling into .NET ComVisible
objects. So I assume this uses the workstation GC, right?
Anyhow, so I think that I can solve problem (1) with String.Compare()
which can perform in-place case-insensitive comparisons without generating
new string instances.

Problem (2), however, is more complicated. There doesn''t appear to be a
TrimmedLength or any type of method or property that can give me the
length of a string, minus whitespace and without generating a new string
instance, in the BCL.

I suppose I could do some unsafe, or even unmanaged code (which is what
MSFT did for all their string handling stuff inside System.String and
using the COMString stuff), but I''d like to try to avoid that, or at least
use a library that''s already written and well tested.

Any thoughts?

Thanks in advance,
Chad Myers



> 1.)if(someString.ToLower()==" somestring")

FxCop实际上会为您捕获并报告此实例。这是我在Visual Studio之外的第二个

最喜欢的工具。
> 1.) if( someString.ToLower() == "somestring" )

FxCop will actually catch and report instances of this for you. It is my 2nd
favorite tool outside of Visual Studio.
2.)if(someString!= null&& someString.Trim()。长度> 0)


我建议使用


if(someString!= null)

someString = someString.Trim();

else

someString ="" ;;


if(someString.Length> 0 )


我的假设是你已经打算在它之前修剪字符串




-

Jonathan Allen

" Chad Myers" <厘米**** @ N0.SP4M.austin.rr.com>在消息中写道

新闻:2r ****************** @ tornado.texas.rr.com ...我一直在进行性能测试我的一个应用程序,我注意到有很多(我的意思是很多 - 兆字节和兆字节的''em)正在创建System.String实例。

我已经做了一些分析,我被引导相信(但是还不能定量地确定为事实)这两个基本的罪魁祸首是很多人打电话给:

1.)if(someString.ToLower()==" somestring")



2.)if(someString! = null&& someString.Trim()。长度> 0)

ToLower()和Trim()一样生成一个新的字符串实例。

我相信这些被多次调用并且比GC可以收集它们更快地生成一串字符串,或者可能还有一些奇怪的实习/缓存事件正在进行中。无论如何,字符串
实例的数量增长和增长。它偶尔会被打倒,但它基本上是前进了5步,后退了1步。

作为参考,这是一个调用.NET ComVisible
对象的ASP应用程序。所以我假设这使用工作站GC,对吧?

无论如何,所以我认为我可以用String.Compare()
解决问题(1),它可以执行就地案例 - 不产生新的字符串实例的不敏感比较。

然而,问题(2)更复杂。似乎没有一个
TrimmedLength或任何类型的方法或属性可以给我一个字符串的长度,减去空格并且不生成新的字符串
实例, BCL。

我想我可以做一些不安全的,甚至是非托管的代码(这是MSFT为System.String中的所有字符串处理工作所做的事情和
使用COMString的东西),但我想尽量避免这种情况,或者至少使用一个已经编写并经过充分测试的库。

有什么想法吗?
提前致谢,
Chad Myers
2.) if( someString != null && someString.Trim().Length > 0 )
I would recommend using

if (someString != null)
someString = someString.Trim();
else
someString = "";

if( someString.Length > 0 )

My assumption here is that you already intend to trim the string before it
is used.

--
Jonathan Allen
"Chad Myers" <cm****@N0.SP4M.austin.rr.com> wrote in message
news:2r******************@tornado.texas.rr.com... I''ve been perf testing an application of mine and I''ve noticed that there
are a lot (and I mean A LOT -- megabytes and megabytes of ''em)
System.String instances being created.

I''ve done some analysis and I''m led to believe (but can''t yet
quantitatively establish as fact) that the two basic culprits are a lot of
calls to:

1.) if( someString.ToLower() == "somestring" )

and

2.) if( someString != null && someString.Trim().Length > 0 )
ToLower() generates a new string instance as does Trim().

I believe that these are getting called many times and churning up a bunch
of strings faster than the GC can collect them, or perhaps there''s some
weird interning/caching thing going on. Regardless, the number of string
instances grows and grows. It gets bumped down occasionally, but it''s
basically 5 steps forward, 1 back.

For reference, this is an ASP application calling into .NET ComVisible
objects. So I assume this uses the workstation GC, right?
Anyhow, so I think that I can solve problem (1) with String.Compare()
which can perform in-place case-insensitive comparisons without generating
new string instances.

Problem (2), however, is more complicated. There doesn''t appear to be a
TrimmedLength or any type of method or property that can give me the
length of a string, minus whitespace and without generating a new string
instance, in the BCL.

I suppose I could do some unsafe, or even unmanaged code (which is what
MSFT did for all their string handling stuff inside System.String and
using the COMString stuff), but I''d like to try to avoid that, or at least
use a library that''s already written and well tested.

Any thoughts?

Thanks in advance,
Chad Myers



Nicholas,


感谢您的快速回复。不幸的是我没有使用.NET 2.0(还没有!),所以

我不能使用泛型。


会循环遍历这样的字符慢下来显着减慢?此外,

每个用字符串缓存的字符串的char [],或者当你调用ToCharArray()或foreach()之类的东西时创建一个新的

字符串(不是每次循环迭代都是
,但在第一次迭代时)?难道我不会只是用新的char []替换一个新的字符串实例而不是通过调用.Trim()得到任何净收益




在您看来,如果我不反对不安全的代码,我能否明显加快这个

,还是不能给我带来太大的影响?


就性能而言,在我们的一些客户的实例中,内存增长很快就是b $ b。看起来它们拥有的内存越多,它的增长速度就越快,因为它拥有如此多的可用内存而且因为它拥有如此多的可用内存,所以我认为GC是松懈的。

不会看到需要积极收集记忆。但它困扰了我们的客户,他们觉得这是一个内存泄漏。


我发现这是一个教育问题,但我想做确定我是正确教育他们的,而不仅仅是组建一名学士学位借口和

Jedi挥手关于GC的东西。


此外,它不是ASP.NET应用程序,它是一个ASP以前用于调用

到VB6 COM对象的应用程序。我们用.NET对象替换了VB6对象

暴露了一个兼容层。它有一个与旧的VB6相同的ComVisible API,相同的是
(虽然不是二进制兼容的)。迟到的客户

除了COM对象的不同ProgID之外,我不知道其他区别。


所以我们正在处理wkst GC,据我所知(因为只有ASP.NET使用

svr除非你自己托管CLR,从我的理解)。我不确定

我甚至在ASP / COM-interop情况下怎么做,但是,假设它可能是b $ b b bb我们自己的CLR主机使用svr GC的帮助事项

all?


我们大多数客户的服务器都是双处理器或更多处理器盒。


再次感谢,

Chad Myers


Nicholas Paldino [.NET / C#MVP]" < mv*@spam.guard.caspershouse.com>写在

消息新闻:Oz ************* @ TK2MSFTNGP15.phx.gbl ...
Nicholas,

Thanks for the quick reply. Unfortunately I''m not using .NET 2.0 (yet!), so
I can''t use Generics.

Would looping over chars like that slow things down significantly? Also, is
the char[] for each string cached with the string, or is a new one created
when you call things like ToCharArray() or foreach() on the string (not
every loop iteration, but on the first iteration)? Wouldn''t I just be
replacing a new string instance with a new char[] and not get any net gain
over just calling .Trim()?

In your opinion, if I weren''t against unsafe code, could I make this
significantly faster, or would it not afford me much difference?

As far as performance, on some of our clients'' instances, memory growth is
rapid. It seems the more memory they have, the faster it grows which leads
me to believe that the GC is being lax since it has so much free memory and
doesn''t see the need to aggressively collect memory. But it bothers our
clients and they perceive this to be a memory leak.

I realize it''s an education issue, but I want to make sure that I''m
educating them correctly, as opposed to just making up a B.S. excuse and
Jedi hand-waving about the GC stuff.

Also, it''s not an ASP.NET application, it''s an ASP app that used to call
into VB6 COM objects. We''ve replaced the VB6 objects with .NET objects
exposing a "compatibility layer" that has a ComVisible API that is identical
(though not binary compatible) with the old VB6 stuff. Late-bound clients
don''t know the difference other than a different ProgID for the COM objects.

So we''re dealing with the wkst GC, as far as I know (since only ASP.NET uses
svr unless you host the CLR yourself, from what I understand). I''m not sure
how I''d even do that in an ASP/COM-interop situation, but, assuming it''s
possible, would writing our own CLR host to use the svr GC help matters at
all?

Most of our clients'' servers are dual-or-more processor boxes.

Thanks again,
Chad Myers

"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.com> wrote in
message news:Oz*************@TK2MSFTNGP15.phx.gbl...
乍得,

对于第一个场景,你的解决方案应该会增加。

对于第二个场景,你应该使用一次反射来获得对内部静态字符数组WhitespaceChars的引用在
字符串类。然后,你可以编写一个循环传递给它的字符串的方法,如下所示:

public static bool TrimIsNullOrEmpty(string value)

/
/ /如果为null,则退出。
if(value == null)
{
//返回true。
返回true;
}

//遍历字符串中的字符。如果在空白数组中找不到该字符,则返回false,否则,完成后返回
true。
foreach(char c in value)
{
//如果在WhitespaceArray中找不到该字符,那么
返回
// false。
if(Array.IndexOf< char>(WhitespaceArray,char)= = -1)
//返回false。
返回false;
}


//返回true,字符串充满了空白。
返回true;

我在Array类中使用了IndexOf方法的泛型版本,以消除装箱。另外,如果你真的想从这里挤出最后一点性能,你可以把WhitespaceArray和
用作字典中的键。空白字符的数量是25(现在,即)。但是,如果你的字符串通常用空格填充,那么你可以通过最初复制数组,然后将空格字符作为数组中的第一个
元素来提高速度。 (这会导致大多数对IndexOf的调用非常快速地返回,可能比在字典中查找更快)。

我很好奇,你是否看到了性能问题,或者你只是看到数字并担心它们吗?随着时间的推移,ASP.NET应用程序倾向于与GC进行良好的沟通。

希望这会有所帮助。

-
- Nicholas Paldino [.NET / C#MVP]
- mv*@spam.guard.caspershouse.com

Chad Myers <厘米**** @ N0.SP4M.austin.rr.com>在消息中写道
新闻:2r ****************** @ tornado.texas.rr.com ...
Chad,

For the first scenario, your solution should give you an increase.

For the second scenario, you should use reflection once to get a
reference to the internal static character array WhitespaceChars on the
string class. Then, you can write a method which will cycle through a
string passed to it, like so:

public static bool TrimIsNullOrEmpty(string value)
{
// If null, then get out.
if (value == null)
{
// Return true.
return true;
}

// Cycle through the characters in the string. If the character is not
found
// in the whitespace array, return false, otherwise, when done, return
true.
foreach (char c in value)
{
// If the character is not found in the WhitespaceArray, then
return
// false.
if (Array.IndexOf<char>(WhitespaceArray, char) == -1)
{
// Return false.
return false;
}
}

// Return true, the string is full of whitespace.
return true;
}

I used the generic version of the IndexOf method on the Array class in
order to eliminate boxing. Also, if you really want to squeeze out every
last bit of performance from this, you can take the WhitespaceArray and
use the characters as keys in a dictionary. The number of whitespace
characters is 25 (right now, that is). However, if your strings typically
are padded with spaces, then you could get a big speed boost by copying
the array initially, and then placing the space character as the first
element in the array (which would cause most of the calls to IndexOf to
return very quickly, probably quicker than a lookup in a dictionary).

I am curious though, are you seeing a performance issue, or do you just
see the numbers and are worried about them? ASP.NET applications tend to
get in a nice groove with the GC over time.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Chad Myers" <cm****@N0.SP4M.austin.rr.com> wrote in message
news:2r******************@tornado.texas.rr.com...
我是已经对我的应用程序进行了性能测试,我注意到有很多(我的意思是很多 - 兆字节和兆字节的''em)正在创建System.String实例。

我已经做了一些分析,我被引导相信(但还不能定量地确定为事实)两个基本的罪魁祸首很多>来电:

1.)if(someString.ToLower()==" somestring")



2.) if(someString!= null&& someString.Trim()。Length> 0)

ToLower()和Trim()一样生成一个新的字符串实例。

我相信这些被多次调用并且比GC可以收集它们更快地生成一串字符串,或者可能还有一些奇怪的实习/缓存事件正在进行中。无论如何,
字符串实例的数量增长和增长。它偶尔会被打倒,但它基本上是前进了5步,后退了1步。

作为参考,这是一个调用.NET ComVisible
对象的ASP应用程序。所以我假设这使用工作站GC,对吧?

无论如何,所以我认为我可以用String.Compare()
解决问题(1),它可以执行就地案例 - 不产生新字符串实例的不敏感比较。

然而,问题(2)更复杂。似乎没有一个
TrimmedLength或任何类型的方法或属性可以给我一个字符串的长度,减去空格并且不生成新的字符串
实例, BCL。

我想我可以做一些不安全的,甚至是非托管的代码(这是MSFT为System.String中的所有字符串处理工作所做的事情和
使用COMString的东西),但我想尽量避免这种情况,或者至少使用一个已经编写并经过充分测试的库。

有什么想法吗?
提前致谢,乍得迈尔斯
I''ve been perf testing an application of mine and I''ve noticed that there
are a lot (and I mean A LOT -- megabytes and megabytes of ''em)
System.String instances being created.

I''ve done some analysis and I''m led to believe (but can''t yet
quantitatively establish as fact) that the two basic culprits are a lot
of calls to:

1.) if( someString.ToLower() == "somestring" )

and

2.) if( someString != null && someString.Trim().Length > 0 )
ToLower() generates a new string instance as does Trim().

I believe that these are getting called many times and churning up a
bunch of strings faster than the GC can collect them, or perhaps there''s
some weird interning/caching thing going on. Regardless, the number of
string instances grows and grows. It gets bumped down occasionally, but
it''s basically 5 steps forward, 1 back.

For reference, this is an ASP application calling into .NET ComVisible
objects. So I assume this uses the workstation GC, right?
Anyhow, so I think that I can solve problem (1) with String.Compare()
which can perform in-place case-insensitive comparisons without
generating new string instances.

Problem (2), however, is more complicated. There doesn''t appear to be a
TrimmedLength or any type of method or property that can give me the
length of a string, minus whitespace and without generating a new string
instance, in the BCL.

I suppose I could do some unsafe, or even unmanaged code (which is what
MSFT did for all their string handling stuff inside System.String and
using the COMString stuff), but I''d like to try to avoid that, or at
least use a library that''s already written and well tested.

Any thoughts?

Thanks in advance,
Chad Myers




这篇关于快速字符串操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆