德尔福归并为字符串数组 [英] delphi mergesort for string arrays

查看:187
本文介绍了德尔福归并为字符串数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

发现这个codeD归并在 http://www.explainth.at/ EN /德尔福/ dsort.shtml (网站下来,但尝试Wayback机器或本网站的http://read.pudn.com/downloads192/source$c$c/delphi_control/901147/Sorts.pas__.htm)但实质上定义的阵列并非为字符串的数组。 键入TSortArray =阵列[0..8191]双; 我想通过字符串,将可能消除重复(这将是联盟?)的和preserve原来的顺序如果可能的话后诉诸回原来的索引位置减去当然副本的数组(原指数) 的,以便阵列可以被传递回用于进一步处理。我使用的字符串非常大的文件数以百万计的字符串(14〜30亿美元),以便的TStringList 是不是一种选择。对于这些大文件最好的选择是使用串的阵列或阵列的记录(或者单链表??)和排序与大量的数据变得稳定算法

  1. 我怎样才能改变这种采取字符串?数组
  2. 怎样才可以进一步的修改,删除或至少标记复制?
  3. 是否可以保存原始的指数,把后面的字符串在原来的位置?
  4. 是字符串数组或记录了大量的字符串更好的阵列相比,单链表?

问题的重要性排列,所以,如果你回答问题,1号这是唯一的罚款。预先感谢您为您的所有投入。


 程序归并(VAR瓦尔斯:TSortArray; ACount:整数);
VAR AVals:TSortArray;

  程序合并(ALO,际,AHI:整数);
  变种I,J,K,M:整数;
  开始
    I:= 0;
    对于j:= ALO来之际做
    开始
      AVals [我]:=丘壑[J]。
      增量(ⅰ);
      //复制下半部或丘壑到临时数组AVals
    结束;

    我:= 0;记者:=绵绵+ 1; K:= ALO;∥焦可经过未定义循环!
    而((K< j)和(J< = AHI))做
    如果(AVals [1]  - ;丘壑[J]。)然后
    开始
      瓦尔斯[K]:= AVals [I]
      增量(ⅰ);增量(k)的;
    结束其他
    开始
      瓦尔斯[K]:=丘壑[J]。
      增量(k)的;增量(j)条;
    结束;
    {找到瓦尔斯或AVals下一个最大的价值,并将其复制到
     正确的位置。}

    对于M:= K到j  -  1做
    开始
      瓦尔斯[M]:= AVals [I]
      增量(ⅰ);
    结束;
    //复制回任何剩余的,未排序的,元素
  结束;

  程序PerformMergeSort(ALO,AHI:整数);
  VAR在一片:整数​​;
  开始
    如果(ALO< AHI),然后
    开始
      在一片:=(ALO + AHI)SHR 1;
      PerformMergeSort(ALO,在一片);
      PerformMergeSort(在一片+ 1,AHI);
      合并(ALO,际,AHI); < ====传递数组作为字符串会导致AV击穿这里
    结束;
  结束;

开始
  SetLength(AVals,ACount);
  PerformMergeSort(0,ACount  -  1);
结束;
 

解决方案

回答第二个问题:修改归并具有重复删除。应该为字符串。

  //返回新的有效长度
功能MergeSortRemoveDuplicates(VAR瓦尔斯:整数数组):整数;
变种
  AVals:整数数组;

   //最后一个有效元素的回报率指标
  功能合并(I0,I1,J0,J1:整数):整数;
  变种
    I,J,K,LC:整数;
  开始
    LC:= I1  -  I0;
    对于i:= 0至LC做
      AVals [我]:=丘壑[I + I0]。
      //复制下半部或丘壑到临时数组AVals

    K:= I0;
    I:= 0;
    记者:= J0;
    而((I< = LC)和(J< = J1))做
    如果(AVals [1]  - ;丘壑[J]。)然后开始
      瓦尔斯[K]:= AVals [I]
      增量(ⅰ);
      增量(k)的;
    结束否则如果(AVals [I]>丘壑[J]。)然后开始
      瓦尔斯[K]:=丘壑[J]。
      增量(k)的;
      INC。(J);
    其他结束开始//重复
      瓦尔斯[K]:= AVals [I]
      增量(ⅰ);
      INC。(J);
      增量(k)的;
    结束;

    //复制休息
    而I< = LC也开始
      瓦尔斯[K]:= AVals [I]
      增量(ⅰ);
      增量(k)的;
    结束;

    如果K<> Ĵ然后
      而J< = J1做的开始
        瓦尔斯[K]:=丘壑[J]。
        增量(k)的;
        INC。(J);
      结束;

    结果:= K  -  1;
  结束;

 //最后一个有效元素的回报率指标

  功能PerformMergeSort(ALO,AHI:整数):整数; //返回
  变种
    在一片,I1,J1:整数;
  开始

  //这将是明智的,使用插入排序时(AHI  -  ALO)小(约32-100)
    如果(ALO< AHI),然后
    开始
      在一片:=(ALO + AHI)SHR 1;
      I1:= PerformMergeSort(ALO,在一片);
      J1:= PerformMergeSort(在一片+ 1,AHI);
      结果:=合并(ALO,I1,在一片+ 1,J1);
    结束其他
      结果:= ALO;
  结束;

开始
  SetLength(AVals,长度(丘壑)DIV 2 + 1);
  结果:= 1 + PerformMergeSort(0,高(瓦尔斯));
结束;


//短的测试
变种
  答:数组整数;
  我,NewLen:整数;
开始
  随机化;
  SetLength(A,12);
  对于i:= 0为高(A)办
    A [1]:=随机(10);
   NewLen:= MergeSortRemoveDuplicates(A);
   SetLength(A,NewLen);
   对于i:= 0为高(A)办
     Memo1.Lines.Add(IntToStr(A [1]))
  结束;
 

简单的修改字符串:

 函数MergeSortRemoveDuplicates(VAR瓦尔斯:数组字符串):整数;
变种
  AVals:字符串数组;
 

和测试案例:

 变种
  清单:TStringList中;
  编曲:字符串数组;
  I,N:整数;
开始
  与TStringList.Create做尝试
    LoadFromFile(F:\ m2.txt'); //包含一些相同的字符串
    SetLength(编曲,计数);
    对于i:= 0计数 -  1做
      编曲[我]:=字符串[我]
  最后
    自由
  结束;
  N:= MergeSortRemoveDuplicates(编曲);
  对于i:= 0到n  -  1做
    Memo1.Lines.Add(ARR [I]);
结束;
 

Found this coded mergesort on http://www.explainth.at/en/delphi/dsort.shtml (site down but try wayback machine or this site: http://read.pudn.com/downloads192/sourcecode/delphi_control/901147/Sorts.pas__.htm) but essentially the array defined was not for an array of string. type TSortArray = array[0..8191] of Double; I want to pass an array of string that would possibly eliminate duplicates (this would be Union?) and preserve original order if possible for later resorting it back to original index position minus the duplicates of course (original index) so array can be passed back for further processing. I am using very large files of strings with millions of strings (14 to 30 million) so TStringList is not an option. Best option for these large files is to use arrays of string or arrays of records (or maybe single linked list??) and sort with stable algorithm made for large amount of data.

  1. How can I change this to take array of string?
  2. How can it be further modified to delete or at least mark duplicates?
  3. Is it possible to store original index number to place back strings in original position?
  4. Are arrays of string or arrays of record better for large number of strings as compared to a single linked list?

Questions are listed in order of importance so if you answer question number 1 only that is fine. Thank you in advance for all your input.


procedure MergeSort(var Vals:TSortArray;ACount:Integer);
var AVals:TSortArray;

  procedure Merge(ALo,AMid,AHi:Integer);
  var i,j,k,m:Integer;
  begin
    i:=0;
    for j:=ALo to AMid do
    begin
      AVals[i]:=Vals[j];
      inc(i);
      //copy lower half or Vals into temporary array AVals
    end;

    i:=0;j:=AMid + 1;k:=ALo;//j could be undefined after the for loop!
    while ((k < j) and (j <= AHi)) do
    if (AVals[i] < Vals[j]) then
    begin
      Vals[k]:=AVals[i];
      inc(i);inc(k);
    end else
    begin
      Vals[k]:=Vals[j];
      inc(k);inc(j);
    end;
    {locate next greatest value in Vals or AVals and copy it to the
     right position.}

    for m:=k to j - 1 do
    begin
      Vals[m]:=AVals[i];
      inc(i);
    end;
    //copy back any remaining, unsorted, elements
  end;

  procedure PerformMergeSort(ALo,AHi:Integer);
  var AMid:Integer;
  begin
    if (ALo < AHi) then
    begin
      AMid:=(ALo + AHi) shr 1;
      PerformMergeSort(ALo,AMid);
      PerformMergeSort(AMid + 1,AHi);
      Merge(ALo,AMid,AHi);   <==== passing the array as string causes AV breakdown here
    end;
  end;

begin
  SetLength(AVals, ACount);
  PerformMergeSort(0,ACount - 1);
end;

解决方案

Answer to the second question: Mergesort modification with duplicate deleting. Should work for strings.

//returns new valid length
function MergeSortRemoveDuplicates(var Vals: array of Integer):Integer;
var
  AVals: array of Integer;

   //returns index of the last valid element
  function Merge(I0, I1, J0, J1: Integer):Integer;
  var
    i, j, k, LC:Integer;
  begin
    LC := I1 - I0;
    for i := 0 to LC do
      AVals[i]:=Vals[i + I0];
      //copy lower half or Vals into temporary array AVals

    k := I0;
    i := 0;
    j := J0;
    while ((i <= LC) and (j <= J1)) do
    if (AVals[i] < Vals[j]) then begin
      Vals[k] := AVals[i];
      inc(i);
      inc(k);
    end else  if (AVals[i] > Vals[j]) then begin
      Vals[k]:=Vals[j];
      inc(k);
      inc(j);
    end else begin //duplicate
      Vals[k] := AVals[i];
      inc(i);
      inc(j);
      inc(k);
    end;

    //copy the rest
    while i <= LC do begin
      Vals[k] := AVals[i];
      inc(i);
      inc(k);
    end;

    if k <> j then
      while j <= J1 do begin
        Vals[k]:=Vals[j];
        inc(k);
        inc(j);
      end;

    Result := k - 1;
  end;

 //returns index of the last valid element

  function PerformMergeSort(ALo, AHi:Integer): Integer; //returns
  var
    AMid, I1, J1:Integer;
  begin

  //It would be wise to use Insertion Sort when (AHi - ALo) is small (about 32-100)
    if (ALo < AHi) then
    begin
      AMid:=(ALo + AHi) shr 1;
      I1 := PerformMergeSort(ALo, AMid);
      J1 := PerformMergeSort(AMid + 1, AHi);
      Result := Merge(ALo, I1, AMid + 1, J1);
    end else
      Result := ALo;
  end;

begin
  SetLength(AVals, Length(Vals) div 2 + 1);
  Result := 1 + PerformMergeSort(0, High(Vals));
end;


//short test
var
  A: array of Integer;
  i, NewLen: Integer;
begin
  Randomize;
  SetLength(A, 12);
  for i := 0 to High(A) do
    A[i] := Random(10);
   NewLen := MergeSortRemoveDuplicates(A);
   SetLength(A, NewLen);
   for i := 0 to High(A) do
     Memo1.Lines.Add(IntToStr(A[i]))
  end;

Simple modification for strings:

function MergeSortRemoveDuplicates(var Vals: array of String):Integer;
var
  AVals: array of String;

and test case:

var
  List: TStringList;
  Arr: array of string;
  i, n: Integer;
begin
  with TStringList.Create do try
    LoadFromFile('F:\m2.txt'); //contains some equal strings
    SetLength(Arr, Count);
    for i := 0 to Count - 1 do
      Arr[i] := Strings[i];
  finally
    Free
  end;
  n := MergeSortRemoveDuplicates(Arr);
  for i := 0 to n - 1 do
    Memo1.Lines.Add(Arr[i]);
end;

这篇关于德尔福归并为字符串数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆