波什()UTF8字符串范围内 [英] Pos() within utf8 string boundaries
问题描述
我想有一个名次()适于使用的源字符串中指定的边界,而不是有它在整个数据执行搜索。
I'd like to have a Pos() adapted to be used specifying boundaries within the Source string, rather than have it perform the search in the entire data.
让我们说我有一个字符串,它是100个字符长,我想只有5和第(UNI code / UTF8)字符串的第20个字符之间进行排名
Let's say I have a string which is 100 chars long, I want to perform the Pos only between the 5th and 20th character of the (unicode/utf8) string.
在code应该从德尔福的ASM快速code实现进行调整,显然避免pre-复制字符串到一个时间一个部分,因为目的是使它的速度比这一点。
The code should be adapted from the ASM fastcode implementation in delp and obviously avoid pre-copying the portion of the string to a temporal one, as the purpose is making it faster than that.
我的情况:
我有每次被访问多次的字符串,而且,它的一部分被复制到另一个时间字符串,则一个方位上它执行。我想避免中介副本每次和我宁可在指定范围内进行排名
I have a string which is accessed many times, and each time, a portion of it is copied to another temporal string, then a Pos is performed on it. I want to avoid the intermediary copy every time, and rather perform the Pos within the boundaries I specify.
编辑:后新修改的问题被视为重复
question edited after new one was deemed a duplicate.
我仍然会像扩展了目前的XE3快速code组装实现,因为这将在这里适合我的目标的解决方案。
I would still like a solution that expands on the current XE3 FastCode assembly implementation, as that would fit my goal here.
推荐答案
下面是不是基于ASM的替代品。
它也将在64位应用程序的工作。
Here is an alternative that is not based on asm. It will also work on a 64-bit application.
function PosExUBound(const SubStr, Str: UnicodeString; Offset,EndPos: Integer): Integer; overload;
var
I, LIterCnt, L, J: NativeInt;
PSubStr, PS: PWideChar;
begin
L := Length(SubStr);
if (EndPos > Length(Str)) then
EndPos := Length(Str);
{ Calculate the number of possible iterations. Not valid if Offset < 1. }
LIterCnt := EndPos - Offset - L + 1;
{- Only continue if the number of iterations is positive or zero (there is space to check) }
if (Offset > 0) and (LIterCnt >= 0) and (L > 0) then
begin
PSubStr := PWideChar(SubStr);
PS := PWideChar(Str);
Inc(PS, Offset - 1);
Dec(L);
I := 0;
J := L;
repeat
if PS[I + J] <> PSubStr[J] then
begin
Inc(I);
J := L;
Dec(LIterCnt);
if (LIterCnt < 0)
then Exit(0);
end
else
if (J > 0) then
Dec(J)
else
Exit(I + Offset);
until false;
end;
Result := 0;
end;
我会离开它作为一个锻炼; Tibial来实施 AnsiString类型
重载版本。
顺便说一句,在 purepascal
的XE3的波什()
功能部件说得客气一点写得不好。见purepascal 在波什 QC111103低效循环()。如果你喜欢给它投票。
BTW, the purepascal
parts of the Pos()
functions in XE3 are to put it mildly poorly written. See QC111103 Inefficient loop in Pos() for purepascal. Give it a vote if you like.
这篇关于波什()UTF8字符串范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!