Delphi-检查元素是否在TArray< string>中存在两次. [英] Delphi - check if element exists twice in TArray<string>

查看:276
本文介绍了Delphi-检查元素是否在TArray< string>中存在两次.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有此代码

var
arr: TArray<string>;
e1, e2, e3, e4: string;
begin

e1 := 'val1';
e2 := 'val2';
e3 := 'val3';
e4 := 'val4';

arr := TArray<string>.Create(e1, e2, e3, e4);

我现在需要检查上述数组两次中是否存在e1到e4的最佳方法是什么? 我还应该跳过检查任何元素是否为空值的情况.

I need now to check if e1 to e4 exists in the above array twice what is the best way to do it ? I should also skip checking any element have a null value.

还请告知我是否应该手动释放该阵列

also please advise if I should free this array manually

谢谢

推荐答案

玩算法很有趣;如果原始数组足够大,无法使用朴素的O(n^2)算法,并且由于OP使用的是具有泛型的Delphi版本,则我建议使用TDictionary<string, integer>来跟踪所有字符串而没有排序,并找出重复项.

For the fun of playing with algorithms; If the original array is large-enough to make the naive O(n^2) algorithm impractical, and since the OP is using a version of Delphi that has generics, I propose using an TDictionary<string, integer> to track all strings without sorting, and identify duplicates.

这将是有效的,原因有很多:

This would be efficient for a number of reasons:

  • TDictionary提供了恒定时间的插入,几乎是O(n).我们从一开始就知道数组的大小,因此我们无需增加TDictionary就可以使用它.这使得整个算法为O(n).
  • 由于字符串已经存在于数组中,并且Delphi字符串已进行引用计数,因此将字符串放入数组中实际上不会复制字符串!
  • 使用此算法,数组只能遍历一次.
  • TDictionary offers constant-time inserts, it's almost O(n). We know the size of the array from the beginning so we can use TDictionary without ever growing it. This makes the whole algorithm O(n).
  • Since the strings are already in an array, and Delphi strings are reference-counted, putting strings into the array isn't going to actually copy the string!
  • With this algorithm the array is walked only once.

代码:

type
  TZeroWidthRecord = record
  end;

function FindFirstDuplicate(const ArrayOfString: array of string): Integer;
var Dict: TDictionary<string, TZeroWidthRecord>;
    i: Integer;
    ZW: TZeroWidthRecord;
begin
  Dict := TDictionary<string, TZeroWidthRecord>.Create(Length(ArrayOfString));
  try
    for i:=0 to High(ArrayOfString) do
      try
        Dict.Add(ArrayOfString[i], ZW);
      except on E:Exception do
        Exit(i);
      end;
  finally Dict.Free;
  end;
  // No duplicates found:
  Result := -1;
end;


为回答David的评论,我制作了一个简短的测试程序,将基于TDictionary的算法与sort-and-search算法进行了比较.我创建了随机的字符串数组,然后尝试查找第一个重复的字符串.我的数组不包含重复项,如果存在重复项,则TDictionary的运行时间将与平均首次命中重复项成比例.例如,如果TDictionary平均在数组中间找到一个副本,则TDict算法的平均运行时间将是一半.基于排序的算法需要对整个数组进行排序,而排序是最耗时的.


To answer David's comment I made a short test program that compares this TDictionary based algorithm to the sort-and-search algorithm. I created random arrays of strings then tried finding the first duplicate. My arrays contain no duplicates, if there were duplicates then the run-time for the TDictionary would be proportional to the the average first-hit duplicate. If, for example, on average the TDictionary would find a duplicate in the middle of the array then the average runtime for the TDict algorithm would be half. The sort-based algorithm needs to sort the whole array, and the sorting is what takes up the most time.

与基于排序和字典的算法一样,人们需要用 realistic 数据进行测试.例如,如果我已经用OP的问题中的短字符串进行了测试,那么TDict和sort之间就不会有竞争:Dict甚至对于琐碎的长度数组来说也会更快.但是,一旦平均字符串长度增加,基于排序的算法就会开始变得更好.但是,这又取决于字符串:例如,如果大多数字符串共享一个长前缀,则排序算法中的比较"阶段将花费更长的时间,从而使TDictionary看起来更好!

As always with sorting and dictionary based algorithms, one needs to test with realistic data. If, for example, I'd have tested with the short strings in the OP's questions, there would be no competition between TDict and sort: Dict would be faster even for trivial length arrays. But the moment the average string length increases, the sort based algorithm starts getting better; But then again, this depends on the strings: if, for example, most of the strings would share a long prefix then the "compare" stage in the sorting algorithm would take significantly longer, making TDictionary look better again!

*==========*===========================*
|          | Number of strings         |
| Avg str  | in the Array for          |
| length   | TDictionary to be faster  |
*======================================*
| 7        | 33                        |
| 10       | 73                        |
| 13       | 73                        |
| 16       | 163                       |
| 19       | 163                       |
| 22       | 366                       |
| 25       | 366                       |
| 28       | 549                       |
| 37       | 2776                      |
| 40       | 2776                      |
| 43       | 2776                      |
| 46       | 4164                      |
| 49       | 9369                      |
| 52       | 9369                      |
| 55       | 9369                      |
| 58       | 21079                     |
*==========*===========================*

测试表2,在数组的1/2处重复

如果在数组中间恰好找到第一个重复项,则将是结果.注意平均字符串长度58的巨大差异:

Test table 2, duplicate at 1/2 of the array

This would be the results if the first duplicate is found exactly in the middle of the array. Notice the huge difference for the average string length 58:

*==========*===========================*
|          | Number of strings         |
| Avg str  | in the Array for          |
| length   | TDictionary to be faster  |
*======================================*
| 30       | 109                       |
| 33       | 163                       |
| 36       | 163                       |
| 58       | 366                       |
*==========*===========================*

测试表3,重复1/4

如果在数组的1/4左右找到第一个重复项,就会发生这种情况:

Test table 3, duplicate at 1/4

And this would happen if the first duplicate is found around 1/4 of the array:

*==========*===========================*
|          | Number of strings         |
| Avg str  | in the Array for          |
| length   | TDictionary to be faster  |
*======================================*
| 29       | 73                        |
| 32       | 73                        |
| 38       | 73                        |
| 57       | 109                       |
*==========*===========================*

这是测试应用程序: http://pastebin.com/vDznwKtZ

这篇关于Delphi-检查元素是否在TArray&lt; string&gt;中存在两次.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆