波什()UTF8字符串范围内 [英] Pos() within utf8 string boundaries

查看:214
本文介绍了波什()UTF8字符串范围内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想有一个名次()适于使用的源字符串中指定的边界,而不是有它在整个数据执行搜索。

I'd like to have a Pos() adapted to be used specifying boundaries within the Source string, rather than have it perform the search in the entire data.

让我们说我有一个字符串,它是100个字符长,我想只有5和第(UNI code / UTF8)字符串的第20个字符之间进行排名

Let's say I have a string which is 100 chars long, I want to perform the Pos only between the 5th and 20th character of the (unicode/utf8) string.

在code应该从德尔福的ASM快速code实现进行调整,显然避免pre-复制字符串到一个时间一个部分,因为目的是使它的速度比这一点。

The code should be adapted from the ASM fastcode implementation in delp and obviously avoid pre-copying the portion of the string to a temporal one, as the purpose is making it faster than that.

我的情况:

我有每次被访问多次的字符串,而且,它的一部分被复制到另一个时间字符串,则一个方位上它执行。我想避免中介副本每次和我宁可在指定范围内进行排名

I have a string which is accessed many times, and each time, a portion of it is copied to another temporal string, then a Pos is performed on it. I want to avoid the intermediary copy every time, and rather perform the Pos within the boundaries I specify.

编辑:后新修改的问题被视为重复

question edited after new one was deemed a duplicate.

我仍然会像扩展了目前的XE3快速code组装实现,因为这将在这里适合我的目标的解决方案。

I would still like a solution that expands on the current XE3 FastCode assembly implementation, as that would fit my goal here.

推荐答案

下面是不是基于ASM的替代品。
它也将在64位应用程序的工作。

Here is an alternative that is not based on asm. It will also work on a 64-bit application.

function PosExUBound(const SubStr, Str: UnicodeString; Offset,EndPos: Integer): Integer; overload;
var
  I, LIterCnt, L, J: NativeInt;
  PSubStr, PS: PWideChar;
begin
  L := Length(SubStr);
  if (EndPos > Length(Str)) then
    EndPos := Length(Str);
  { Calculate the number of possible iterations. Not valid if Offset < 1. }

  LIterCnt := EndPos - Offset - L + 1;

  {- Only continue if the number of iterations is positive or zero (there is space to check) }
  if (Offset > 0) and (LIterCnt >= 0) and (L > 0) then
  begin
    PSubStr := PWideChar(SubStr);
    PS := PWideChar(Str);
    Inc(PS, Offset - 1);

    Dec(L);
    I := 0;
    J := L;
    repeat
      if PS[I + J] <> PSubStr[J] then
      begin
        Inc(I);
        J := L;
        Dec(LIterCnt);
        if (LIterCnt < 0)
          then Exit(0);
      end
      else
      if (J > 0) then
        Dec(J)
      else
        Exit(I + Offset);
    until false;
  end;

  Result := 0;
end;

我会离开它作为一个锻炼; Tibial来实施 AnsiString类型重载版本。

顺便说一句,在 purepascal 的XE3的波什()功能部件说得客气一点写得不好。见purepascal 在波什 QC111103低效循环()。如果你喜欢给它投票。

BTW, the purepascal parts of the Pos() functions in XE3 are to put it mildly poorly written. See QC111103 Inefficient loop in Pos() for purepascal. Give it a vote if you like.

这篇关于波什()UTF8字符串范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆