VBA Word的重复删除无法有效运行 [英] Duplicate removal for VBA Word not working effectively

查看:180
本文介绍了VBA Word的重复删除无法有效运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序可以删除重复项,并且一切正常.只是冻结了大数据集,即1到250万个字.

I have a program to remove duplicates and everything is working properly. It is just freezing with large data sets i.e. 1 to 2.5 million words.

我的方法有什么问题?有更好的吗?

What is wrong with my approach? Is there a better one?

Sub DeleteDuplicateParagraphs()

Dim p1 As Paragraph
Dim p2 As Paragraph
Dim DupCount As Long
Dim StartTime As Double
Dim SecondsElapsed As Double

StartTime = Timer

For Each p1 In ActiveDocument.Paragraphs
  If p1.range.Text <> vbCr Then

    For Each p2 In ActiveDocument.Paragraphs
      If p1.range.Text = p2.range.Text Then
        DupCount = DupCount + 1
        If p1.range.Text = p2.range.Text And DupCount > 1 Then p2.range.Delete
      End If
    Next p2
  End If
  DupCount = 0
Next p1

SecondsElapsed = Round(Timer - StartTime, 2)

MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation
DupCount = 0

End Sub

推荐答案

尝试一下(首先将对Microsoft脚本运行时的引用添加到您的VBA项目中):

Try this (first add a reference to the Microsoft Scripting Runtime to your VBA project):

Sub DeleteDuplicateParagraphs()
  Dim p As Paragraph
  Dim d As New Scripting.Dictionary
  Dim t As Variant
  Dim i As Integer
  Dim StartTime As Single

  StartTime = Timer

  ' collect duplicates
  For Each p In ActiveDocument.Paragraphs
    t = p.Range.Text
    If t <> vbCr Then
      If Not d.Exists(t) Then d.Add t, New Scripting.Dictionary
      d(t).Add d(t).Count + 1, p
    End If
  Next

  ' eliminate duplicates
  Application.ScreenUpdating = False
  For Each t In d
    For i = 2 To d(t).Count
      d(t)(i).Range.Delete
    Next
  Next
  Application.ScreenUpdating = True

  MsgBox "This code ran successfully in " & Round(Timer - StartTime, 2) & " seconds", vbInformation
End Sub

这利用了Scripting.Dictionary是一个哈希表的事实,该哈希表旨在非常快速地将唯一键与值相关联.因此,它非常适合发现重复的密钥.字典键必须是字符串,我们可以方便地使用段落文本.

This makes use of the fact that the Scripting.Dictionary is a hash table that is geared towards very quickly associating unique keys with values. It is therefore very good at spotting duplicate keys. Dictionary keys have to be strings, conveniently we can use the paragraph texts for that.

对于值,我们使用更多的字典对象,仅因为它们比VBA的数组好得多.在其中,我们以相同的文本收集对实际段落实例的引用.

For values we use more dictionary objects, solely for the fact that they work a lot better than VBA's arrays. In them we collect the references to the actual paragraph instances with the same text.

实际上,删除重复的段落是一件很简单的事.

Actually deleting duplicate paragraphs is a very simple matter afterwards.

注意:上面代码中的重复检测部分非常快.但是,如果Word在大型文档中变得无响应,则它位于重复删除部分,即由于Word的撤消缓冲区.

Note: The duplicate detection part in the above code is very fast. However, if Word becomes unresponsive in large documents then it's in the duplicate removal part, namely because of Word's undo buffer.

罪魁祸首是段落范围被一个接一个地删除,导致Word建立一个很大的撤消缓冲区.不幸的是,(我知道)没有办法

The culprit is that the paragraph ranges are deleted one after another, causing Word to build a very large undo buffer. Unfortunately there is no way (that I know of) to either

  • 一步删除多个单独的范围(这只会导致撤消缓冲区中只有一个条目),或者
  • 完全从VBA禁用撤消缓冲区

在消除重复项"循环中定期调用UndoClear可能会有所帮助,禁用ScreenUpdating也不是一个坏主意:

Calling UndoClear periodically in the "eliminate duplicates" loop might help, disabling ScreenUpdating is also not a bad idea:

' eliminate duplicates
Dim x As Integer

Application.ScreenUpdating = False
For Each t In d
  x = x + 1
  For i = 2 To d(t).Count
    d(t)(i).Range.Delete
  Next
  If x Mod 50 = 0 Then ActiveDocument.UndoClear
Next
ActiveDocument.UndoClear
Application.ScreenUpdating = True

这篇关于VBA Word的重复删除无法有效运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆