Delphi有快速的GetToken例程吗? [英] Is There A Fast GetToken Routine For Delphi?

查看:1041
本文介绍了Delphi有快速的GetToken例程吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的程序中,我处理了数百万个具有特殊字符的字符串,例如|在每个字符串中分隔标记。我有一个函数返回第n个令牌,这是它:

  function GetTok(const Line:string; const Delim:string; const TokenNum:Byte):string; 
{LK 2007年2月12日 - 此功能已尽可能优化}
var
I,P,P2:integer;
begin
P2:= Pos(Delim,Line);
如果TokenNum = 1,则开始
如果P2 = 0,则
结果:=行
else
结果:= copy(Line,1,P2-1);
end
else begin
P:= 0; {防止警告}
对于I:= 2 to TokenNum do begin
P:= P2;
如果P = 0,则break;
P2:= PosEx(Delim,Line,P + 1);
结束
如果P = 0,那么
结果:=''
如果P2 = 0,则
结果:= copy(Line,P + 1,MaxInt)
else
结果:= copy(Line,P + 1,P2-P-1);
结束
结束{GetTok}

当我使用Delphi 4时,我开发了这个函数。它调用了非常有效的PosEx最初由Fastcode开发的例程现在包含在Delphi的StrUtils库中。



我最近升级到Delphi 2009,我的字符串都是Unicode。这个GetTok功能仍然可以正常工作。



我在Delphi 2009中经历了新的库,还有很多新的功能和补充。



但是我没有看到一个GetToken函数,就像我需要的任何新的Delphi库,在各种fastcode项目,我找不到任何东西与谷歌除了 Zarko Gajic's:Delphi Split / Tokenizer Functions 之外的其他搜索,不是按照我已经有的优化。



任何改进,甚至10%都会在我的程序中引人注目。我知道一个替代方法是StringLists,并且始终保持令牌分开,但是这是一个很大的开销记忆,我不知道我是否做了所有这些工作来转换是否会更快。



唔。所以经过这么长时间的谈话,我的问题真的是:



你知道GetToken例程的任何非常快的实现吗?汇编优化的版本会是理想的吗?



如果没有,有没有可以看到我的代码可能会有所改进的优化?






跟进:Barry Kelly提到了一年前我问的关于优化文件中行的解析的问题。那时候我甚至没想到我的GetTok例程,没有用于读取或解析。现在只有我看到我的GetTok例程的开销导致我问这个问题。直到卡尔·斯科特里奇和巴里的答案,我从来没有想过连接这两个。很明显,但是它没有注册。感谢您的指出。



是的,我的Delim是一个单一的字符,所以显然我有一些主要的优化我可以做。我在GetTok例程(上图)中使用Pos和PosEx使我无法使用字符搜索来加快字符搜索,而代码如下:



while(cp ^>#0)和(cp ^ <= Delim)do
Inc(cp);

  

我将通过大家的答案,尝试各种建议并进行比较。然后我会发布结果。






混淆:好的,现在我真的很困惑。



我把Carl和Barry的建议和PChars一起推荐,这是我的实现:

  function GetTok(const Line:string ; const Delim:string; const TokenNum:Byte):string; 
{LK 2007年2月12日 - 此函数已尽可能优化}
{LK 2009年11月7日 - 使用PChars而不是调用Pos和PosEx进行了优化}
{ http://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi}
var
I:integer;
PLine,PStart:PChar;
begin
PLine:= PChar(Line);
PStart:= PLine;
inc(PLine);
对于I:= 1 to TokenNum do begin
while(PLine ^##0)and(PLine ^&Delphi)do
inc(PLine);
if I = TokenNum then begin
SetString(Result,PStart,PLine - PStart);
break;
结束
如果PLine ^ =#0然后开始
结果:='';
break;
结束
inc(PLine);
PStart:= PLine;
结束
结束{GetTok}

在纸上,我不认为你可以做得比这更好。 p>

所以我把这两个例程都放在这个任务上,并用AQTime看看发生了什么。运行中包含了1,108,514个GetTok调用。



AQTime将原始例程定时在0.40秒。对Pos的百万电话花费了0.10秒。令牌半数= 1份,共有五十万次,花费了0.10秒。 60万个PosEx电话只需要0.03秒。



然后,我使用AQTime为相同的运行和完全相同的调用定时了我的新例程。 AQTime报告说,我的新快速例程花费了3.65秒,这是9倍。根据AQTime的罪魁祸首是第一个循环:

  while(PLine ^##)和(PLine ^<> ;> Delim)do 
inc(PLine);

执行了1800万次的while行以2.66秒的速度报告。执行1600万次的公司线被称为需要0.47秒。



现在我以为我知道这里发生了什么。我在去年提出的一个问题中遇到了与AQTime类似的问题:为什么是CharInSet比Case语句更快?



再次是Barry Kelly引导了我的观点。基本上,像AQTime这样的测试分析器不一定要做微型优化工作。它增加了每一行的开销,这可能会使这些数字中清楚地显示的结果变成沼泽。在我的新优化代码中执行的3400万条线路压倒了原始代码的数百万行,Pos和PosEx例程显然很少或没有开销。



Barry给了我一个使用QueryPerformanceCounter的代码示例来检查他是否正确,在这种情况下他是。



好的,我们现在用QueryPerformanceCounter来做同样的事情来证明我的新例程比AQTime说的更快,不会慢9倍。所以这里我去:

  function TimeIt(const Title:string):double; 
var i:整数;
开始,完成,频率:Int64;
秒:双;
begin
QueryPerformanceCounter(start);
for i:= 1到250000 do
GetTokOld('这是一个字符串|需要|解析','|',1);
为i:= 1到250000 do
GetTokOld('这是一个字符串|需要|解析','|',2);
为i:= 1到250000 do
GetTokOld('这是一个字符串|需要|解析','|',3);
for i:= 1到250000 do
GetTokOld('这是一个字符串|需要|解析','|',4);
QueryPerformanceCounter(finish);
QueryPerformanceFrequency(freq);
秒:=(finish - start)/ freq;
结果:=秒;
结束

所以这将测试1,000,000个电话到GetTok。



我的旧程序与Pos和PosEx的电话占用了0.29秒。
新的PChars 2.07秒。



现在我完全被迷惑了!任何人都可以告诉我为什么PChar程序不仅速度慢了,而且慢了8到9倍!






神秘解决! Andreas在他的答案中表示将Delim参数从一个字符串更改为Char。我总是只使用一个Char,所以至少对我来说这是非常有可能的。我感到惊讶的是发生了什么。



100万次电话的时间从1.88秒下降到了.22秒。



令人惊讶的是,当我将原来的Pos​​ / PosEx例程的时间从.29延长到.44秒,当我将Delim参数更改为Char时。



坦白说,我对Delphi的优化器感到失望。那个Delim是一个常数参数。优化器应该注意到循环中发生相同的转换,并且应该将其移出,以便它只能执行一次。



双重检查我的代码生成参数,是的,我确实有优化True和String格式检查关闭。



底线是安德烈修复的新PChar例程比我原来的快了25%对于.29)。



我还想跟进这里的其他评论并进行测试。






关闭优化并打开String格式检查只会将时间从.22增加到.30。它与原始文件相同。



使用汇编程序代码或调用Pos或PosEx编写的例程的优点是它们不受任何代码你设置的生成选项。他们将一直运行相同的方式,预先优化和不blo肿的方式。



我在过去几天重申,最好的方式来比较代码微型优化是在CPU窗口中查看并比较汇编代码。如果Embarcadero可以使窗口更方便,那么这将是很好的,并允许我们将部分复制到剪贴板或打印其中的部分。



另外,我不公平在这篇文章之前,AQTime砰的一声,认为为我的新例程添加的额外时间仅仅是因为它添加的仪器。现在我回去查看Char参数而不是String,while循环下降到.30秒(从2.66开始),而inc行下降到.14秒(从.47)。奇怪的是,线路也会下降。但是我已经从所有这些测试中磨损了。






我把卡尔的想法用字符循环,并重写那个代码与这个想法。它进一步改善,从.22秒下降到19秒。所以这里现在是迄今为止最好的:

  function GetTok(const Line:string; const Delim:Char; const TokenNum:Byte ):string; 
{LK 2009年11月8日 - 使用PChars重新优化,而不是调用Pos和PosEx}
{见; http://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi}
var
I,CurToken:Integer;
PLine,PStart:PChar;
begin
CurToken:= 1;
PLine:= PChar(Line);
PStart:= PLine;
对于I:= 1到length(Line)do begin
如果PLine ^ = Delim然后开始
如果CurToken = TokenNum然后
break
else begin
CurToken:= CurToken + 1;
inc(PLine);
PStart:= PLine;
结束
end
else
inc(PLine);
结束
如果CurToken = TokenNum然后
SetString(Result,PStart,PLine - PStart)
else
结果:='';
结束

仍然可能会有一些小的优化,例如CurToken = Tokennum比较,应该是相同的类型,整数或字节,以较快者为准。



但是现在我很高兴。



再次感谢StackOverflow Delphi社区。

解决方案

函数(带有PChar的函数)应该将Delim声明为 Char ,而不是 String 。在您当前的实现中,编译器必须将PLine ^ char转换为字符串以将其与Delim进行比较。而这种情况发生在一个紧张的循环中,这是一个巨大的性能命中。

  function GetTok(const Line:string; const Delim:Char {< ===}; const TokenNum:Byte):string; 
{LK 2007年2月12日 - 此函数已尽可能优化}
{LK 2009年11月7日 - 使用PChars而不是调用Pos和PosEx进行了优化}
{ http://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi}
var
I:integer;
PLine,PStart:PChar;
begin
PLine:= PChar(Line);
PStart:= PLine;
inc(PLine);
对于I:= 1 to TokenNum do begin
while(PLine ^##0)and(PLine ^&Delphi)do
inc(PLine);
if I = TokenNum then begin
SetString(Result,PStart,PLine - PStart);
break;
结束
如果PLine ^ =#0然后开始
结果:='';
break;
结束
inc(PLine);
PStart:= PLine;
结束
结束{GetTok}


In my program, I process millions of strings that have a special character, e.g. "|" to separate tokens within each string. I have a function to return the n'th token, and this is it:

function GetTok(const Line: string; const Delim: string; const TokenNum: Byte): string;
{ LK Feb 12, 2007 - This function has been optimized as best as possible }
var
 I, P, P2: integer;
begin
  P2 := Pos(Delim, Line);
  if TokenNum = 1 then begin
    if P2 = 0 then
      Result := Line
    else
      Result := copy(Line, 1, P2-1);
  end
  else begin
    P := 0; { To prevent warnings }
    for I := 2 to TokenNum do begin
      P := P2;
      if P = 0 then break;
      P2 := PosEx(Delim, Line, P+1);
    end;
    if P = 0 then
      Result := ''
    else if P2 = 0 then
      Result := copy(Line, P+1, MaxInt)
    else
      Result := copy(Line, P+1, P2-P-1);
  end;
end; { GetTok }

I developed this function back when I was using Delphi 4. It calls the very efficient PosEx routine that was originally developed by Fastcode and is now included in the StrUtils library of Delphi.

I recently upgraded to Delphi 2009 and my strings are all Unicode. This GetTok function still works and still works well.

I have gone through the new libraries in Delphi 2009 and there are many new functions and additions to it.

But I have not seen a GetToken function like I need in any of the new Delphi libraries, in the various fastcode projects, and I can't find anything with a Google search other than Zarko Gajic's: Delphi Split / Tokenizer Functions, which is not as optimized as what I already have.

Any improvement, even 10% would be noticeable in my program. I know an alternative is StringLists and to always keep the tokens separate, but this has a big overhead memory-wise and I'm not sure if I did all that work to convert whether it would be any faster.

Whew. So after all this long winded talk, my question really is:

Do you know of any very fast implementations of a GetToken routine? An assembler optimized version would be ideal?

If not, are there any optimizations that you can see to my code above that might make an improvement?


Followup: Barry Kelly mentioned a question I asked a year ago about optimizing the parsing of the lines in a file. At that time I hadn't even thought of my GetTok routine which was not used for the that read or parsing. It is only now that I saw the overhead of my GetTok routine which led me to ask this question. Until Carl Smotricz and Barry's answers, I had never thought of connecting the two. So obvious, but it just didn't register. Thanks for pointing that out.

Yes, my Delim is a single character, so obviously I have some major optimization I can do. My use of Pos and PosEx in the GetTok routine (above) blinded me to the idea that I can do it faster with a character by character search instead, with bits of code like:

      while (cp^ > #0) and (cp^ <= Delim) do    
        Inc(cp);

I'm going to go through everyone's answers and try the various suggestions and compare them. Then I'll post the results.


Confusion: Okay, now I'm really perplexed.

I took Carl and Barry's recommendation to go with PChars, and here is my implementation:

function GetTok(const Line: string; const Delim: string; const TokenNum: Byte): string;
{ LK Feb 12, 2007 - This function has been optimized as best as possible }
{ LK Nov 7, 2009 - Reoptimized using PChars instead of calls to Pos and PosEx }
{ See; http://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi }
var
 I: integer;
 PLine, PStart: PChar;
begin
  PLine := PChar(Line);
  PStart := PLine;
  inc(PLine);
  for I := 1 to TokenNum do begin
    while (PLine^ <> #0) and (PLine^ <> Delim) do
      inc(PLine);
    if I = TokenNum then begin
      SetString(Result, PStart, PLine - PStart);
      break;
    end;
    if PLine^ = #0 then begin
      Result := '';
      break;
    end;
    inc(PLine);
    PStart := PLine;
  end;
end; { GetTok }

On paper, I don't think you can do much better than this.

So I put both routines to the task and used AQTime to see what's happening. The run I had included 1,108,514 calls to GetTok.

AQTime timed the original routine at 0.40 seconds. The million calls to Pos took 0.10 seconds. A half a million of the TokenNum = 1 copies took 0.10 seconds. The 600,000 PosEx calls only took 0.03 seconds.

Then I timed my new routine with AQTime for the same run and exactly the same calls. AQTime reports that my new "fast" routine took 3.65 seconds, which is 9 times as long. The culprit according to AQTime was the first loop:

     while (PLine^ <> #0) and (PLine^ <> Delim) do
       inc(PLine);

The while line, which was executed 18 million times, was reported at 2.66 seconds. The inc line, executed 16 million times, was said to take 0.47 seconds.

Now I thought I knew what was happening here. I had a similar problem with AQTime in a question I posed last year: Why is CharInSet faster than Case statement?

Again it was Barry Kelly who clued me in. Basically, an instrumenting profiler like AQTime does not necessarily do the job for microoptimization. It adds an overhead to each line which may swamp the results which is shown clearly in these numbers. The 34 million lines executed in my new "optimized code" overwhelm the several million lines of my original code, with apparently little or no overhead from the Pos and PosEx routines.

Barry gave me a sample of code using QueryPerformanceCounter to check that he was correct, and in that case he was.

Okay, so let's do the same now with QueryPerformanceCounter to prove that my new routine is faster and not 9 times slower as AQTime says it is. So here I go:

function TimeIt(const Title: string): double;
var  i: Integer;
  start, finish, freq: Int64;
  Seconds: double;
begin
  QueryPerformanceCounter(start);
  for i := 1 to 250000 do
    GetTokOld('This is a string|that needs|parsing', '|', 1);
  for i := 1 to 250000 do
    GetTokOld('This is a string|that needs|parsing', '|', 2);
  for i := 1 to 250000 do
    GetTokOld('This is a string|that needs|parsing', '|', 3);
  for i := 1 to 250000 do
    GetTokOld('This is a string|that needs|parsing', '|', 4);
  QueryPerformanceCounter(finish);
  QueryPerformanceFrequency(freq);
  Seconds := (finish - start) / freq;
  Result := Seconds;
end;

So this will test 1,000,000 calls to GetTok.

My old procedure with the Pos and PosEx calls took 0.29 seconds. The new one with PChars took 2.07 seconds.

Now I am completely befuddled! Can anyone tell me why the PChar procedure is not only slower, but is 8 to 9 times slower!?


Mystery solved! Andreas said in his answer to change the Delim parameter from a string to a Char. I'll always be using just a Char, so at least for my implementation this is very possible. I was amazed at what happened.

The time for the 1 million calls went down from 1.88 seconds to .22 seconds.

And surprisingly, the time for my original Pos/PosEx routine went UP from .29 to .44 seconds when I changed it's Delim parameter to a Char.

Frankly, I'm disappointed by Delphi's optimizer. That Delim is a constant parameter. The optimizer should have noticed that the same conversion is happening within the loop and should have moved it out so that it would only be done once.

Double checking my Code generation parameters, yes I do have Optimization True and String format checking Off.

Bottom line is that the new PChar routine with Andrea's fix is about 25% faster than my original (.22 versus .29).

I still want to follow up on the other comments here and test them out.


Turning off optimization and turning on String format checking only increases the time from .22 to .30. It adds about the same to the original.

The advantage to using assembler code, or calling routines written in assembler like Pos or PosEx is that they are NOT subject to what code generation options you have set. They will always run the same way, a pre-optimized and non-bloated way.

I have reaffirmed in the last couple of days, that the best way to compare code for microoptimization is to look at and compare the Assembler code in the CPU window. It would be nice if Embarcadero could make that window a bit more convenient, and allow us to copy portions to the clipboard or to print sections of it.

Also, I unfairly slammed AQTime earlier in this post, thinking that the extra time added for my new routine was solely because of the instrumentation it added. Now that I go back and check with the Char parameter instead of String, the while loop is down to .30 seconds (from 2.66) and the inc line is down to .14 seconds (from .47). Strange that the inc line would go down as well. But I'm getting worn out from all this testing already.


I took Carl's idea of looping by characters, and rewrote that code with that idea. It makes another improvement, down to .19 seconds from .22. So here is now the best so far:

function GetTok(const Line: string; const Delim: Char; const TokenNum: Byte): string;
{ LK Nov 8, 2009 - Reoptimized using PChars instead of calls to Pos and PosEx }
{ See; http://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi }
var
  I, CurToken: Integer;
  PLine, PStart: PChar;
begin
  CurToken := 1;
  PLine := PChar(Line);
  PStart := PLine;
  for I := 1 to length(Line) do begin
    if PLine^ = Delim then begin
      if CurToken = TokenNum then
        break
      else begin
        CurToken := CurToken + 1;
        inc(PLine);
        PStart := PLine;
      end;
    end
    else
      inc(PLine);
  end;
  if CurToken = TokenNum then
    SetString(Result, PStart, PLine - PStart)
  else
    Result := '';
end;

There still may be some minor optimizations to this, such as the CurToken = Tokennum comparison, which should be the same type, Integer or Byte, whichever is faster.

But let's say, I'm happy now.

Thanks again to the StackOverflow Delphi community.

解决方案

Your new function (the one with PChar) should declare "Delim" as Char and not as String. In your current implementation the compiler has to convert the PLine^ char into a string to compare it with "Delim". And that happens in a tight loop resulting is an enormous performance hit.

function GetTok(const Line: string; const Delim: Char{<<==}; const TokenNum: Byte): string;
{ LK Feb 12, 2007 - This function has been optimized as best as possible }
{ LK Nov 7, 2009 - Reoptimized using PChars instead of calls to Pos and PosEx }
{ See; http://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi }
var
 I: integer;
 PLine, PStart: PChar;
begin
  PLine := PChar(Line);
  PStart := PLine;
  inc(PLine);
  for I := 1 to TokenNum do begin
    while (PLine^ <> #0) and (PLine^ <> Delim) do
      inc(PLine);
    if I = TokenNum then begin
      SetString(Result, PStart, PLine - PStart);
      break;
    end;
    if PLine^ = #0 then begin
      Result := '';
      break;
    end;
    inc(PLine);
    PStart := PLine;
  end;
end; { GetTok }

这篇关于Delphi有快速的GetToken例程吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆