Excel VBA 性能 - 100 万行 - 删除包含值的行,不到 1 分钟 [英] Excel VBA Performance - 1 million rows - Delete rows containing a value, in less than 1 min

查看:30
本文介绍了Excel VBA 性能 - 100 万行 - 删除包含值的行,不到 1 分钟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到一种方法来过滤大量数据并在不到一分钟的时间内删除工作表中的行

I am trying to find a way to filter large data and remove rows in a worksheet, in less than one minute

目标:

  • 在第 1 列中查找包含特定文本的所有记录,并删除整行
  • 保持所有单元格格式(颜色、字体、边框、列宽)和公式不变

.

测试数据:

:

.

代码的工作原理:

  1. 首先关闭所有 Excel 功能
  2. 如果工作簿不为空且要删除的文本值存在于第1列

  1. It starts by turning all Excel features Off
  2. If the workbook is not empty and the text value to be removed exists in column 1

  • 将第 1 列的使用范围复制到数组中
  • 向后迭代数组中的每个值
  • 当它找到匹配项时:

  • Copies the used range of column 1 to an array
  • Iterates over every value in array backwards
  • When it finds a match:

  • 将单元格地址附加到格式为 "A11,A275,A3900,..."
  • 的 tmp 字符串中
  • 如果 tmp 变量长度接近 255 个字符
  • 使用 .Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
  • 删除行
  • 将 tmp 重置为空并移至下一组行

.

主要问题是删除操作,总持续时间应该在一分钟以内.任何基于代码的解决方案都可以接受,只要它的执行时间低于 1 分钟.

The main issue is the Delete operation, and total duration time should be under one minute. Any code-based solution is acceptable as long as it performs under 1 minute.

这将范围缩小到极少数可接受的答案.已经提供的答案也非常简短且易于实施.一个在大约 30 秒内执行操作,因此至少有一个答案提供了可接受的解决方案,其他可能会发现也有用

This narrows the scope to very few acceptable answers. The answers already provided are also very short and easy to implement. One performs the operation in about 30 seconds, so there is at least one answer that provides an acceptable solution, and other may find it useful as well

.

我的主要初始功能:

Sub DeleteRowsWithValuesStrings()
    Const MAX_SZ As Byte = 240

    Dim i As Long, j As Long, t As Double, ws As Worksheet
    Dim memArr As Variant, max As Long, tmp As String

    Set ws = Worksheets(1)
    max = GetMaxCell(ws.UsedRange).Row
    FastWB True:    t = Timer

    With ws
        If max > 1 Then
            If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then
                memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2
                For i = max To 1 Step -1

                    If memArr(i, 1) = "Test String" Then
                        tmp = tmp & "A" & i & ","
                        If Len(tmp) > MAX_SZ Then
                           .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
                           tmp = vbNullString

                        End If
                    End If

                Next
                If Len(tmp) > 0 Then
                    .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
                End If
                .Calculate
            End If
        End If
    End With
    FastWB False:   InputBox "Duration: ", "Duration", Timer - t
End Sub

<小时>

辅助函数(关闭和打开 Excel 功能):


Helper functions (turn Excel features off and on):

Public Sub FastWB(Optional ByVal opt As Boolean = True)
    With Application
        .Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic)
        .DisplayAlerts = Not opt
        .DisplayStatusBar = Not opt
        .EnableAnimations = Not opt
        .EnableEvents = Not opt
        .ScreenUpdating = Not opt
    End With
    FastWS , opt
End Sub

Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _
                  Optional ByVal opt As Boolean = True)
    If ws Is Nothing Then
        For Each ws In Application.ActiveWorkbook.Sheets
            EnableWS ws, opt
        Next
    Else
        EnableWS ws, opt
    End If
End Sub

Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean)
    With ws
        .DisplayPageBreaks = False
        .EnableCalculation = Not opt
        .EnableFormatConditionsCalculation = Not opt
        .EnablePivotTable = Not opt
    End With
End Sub

<小时>

找到带有数据的最后一个单元格(感谢@ZygD - 现在我在几种情况下对其进行了测试):


Finds last cell with data (thanks @ZygD - now I tested it in several scenarios):

Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range

    'Returns the last cell containing a value, or A1 if Worksheet is empty

    Const NONEMPTY As String = "*"
    Dim lRow As Range, lCol As Range

    If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange
    If WorksheetFunction.CountA(rng) = 0 Then
        Set GetMaxCell = rng.Parent.Cells(1, 1)
    Else
        With rng
            Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
                                        After:=.Cells(1, 1), _
                                        SearchDirection:=xlPrevious, _
                                        SearchOrder:=xlByRows)
            If Not lRow Is Nothing Then
                Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
                                            After:=.Cells(1, 1), _
                                            SearchDirection:=xlPrevious, _
                                            SearchOrder:=xlByColumns)

                Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)
            End If
        End With
    End If
End Function

<小时>

返回数组中匹配项的索引,如果未找到匹配项,则返回 0:


Returns the index of a match in the array, or 0 if a match is not found:

Public Function IndexOfValInRowOrCol( _
                                    ByVal searchVal As String, _
                                    Optional ByRef ws As Worksheet = Nothing, _
                                    Optional ByRef rng As Range = Nothing, _
                                    Optional ByRef vertical As Boolean = True, _
                                    Optional ByRef rowOrColNum As Long = 1 _
                                    ) As Long

    'Returns position in Row or Column, or 0 if no matches found

    Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long

    result = CVErr(9999) '- generate custom error

    Set usedRng = GetUsedRng(ws, rng)
    If Not usedRng Is Nothing Then
        If rowOrColNum < 1 Then rowOrColNum = 1
        With Application
            If vertical Then
                result = .Match(searchVal, rng.Columns(rowOrColNum), 0)
            Else
                result = .Match(searchVal, rng.Rows(rowOrColNum), 0)
            End If
        End With
    End If
    If IsError(result) Then IndexOfValInRowOrCol = 0 Else IndexOfValInRowOrCol = result
End Function

.

更新:

测试了 6 个解决方案(每个解决方案 3 个测试):Excel Hero 的解决方案是最快的所以far(删除公式)

Tested 6 solutions (3 tests each): Excel Hero's solution is the fastest so far (removes formulas)

.

以下是结果,从最快到最慢:

Here are the results, fastest to the slowest:

.

测试 1. 总共 100,000 条记录,要删除 10,000 条:

Test 1. Total of 100,000 records, 10,000 to be deleted:

1. ExcelHero()                    - 1.5 seconds

2. DeleteRowsWithValuesNewSheet() - 2.4 seconds

3. DeleteRowsWithValuesStrings()  - 2.45 minutes
4. DeleteRowsWithValuesArray()    - 2.45 minutes
5. QuickAndEasy()                 - 3.25 minutes
6. DeleteRowsWithValuesUnion()    - Stopped after 5 minutes

.

测试 2. 总共 100 万条记录,10 万条被删除:

Test 2. Total of 1 million records, 100,000 to be deleted:

1. ExcelHero()                    - 16 seconds (average)

2. DeleteRowsWithValuesNewSheet() - 33 seconds (average)

3. DeleteRowsWithValuesStrings()  - 4 hrs 38 min (16701.375 sec)
4. DeleteRowsWithValuesArray()    - 4 hrs 37 min (16626.3051757813 sec)
5. QuickAndEasy()                 - 5 hrs 40 min (20434.2104492188 sec)
6. DeleteRowsWithValuesUnion()    - N/A

.

注意事项:

  1. ExcelHero 方法:易于实现、可靠、速度极快,但删除了公式
  2. NewSheet 方法:易于实现、可靠且符合目标
  3. Strings 方法:更努力实现,可靠,但不符合要求
  4. Array 方法:类似于 Strings,但 ReDims 一个数组(Union 的更快版本)
  5. QuickAndEasy:易于实现(简短、可靠且优雅),但不符合要求
  6. Range Union:实现复杂度类似于 2 和 3,但速度太慢

我还通过引入异常值使测试数据更加真实:

I also made the test data more realistic by introducing unusual values:

  • 空单元格、范围、行和列
  • 特殊字符,如=[`~!@#$%^&*()_-+{}[]|;:'",.<>/?、单独和多个组合
  • 空格、制表符、空公式、边框、字体和其他单元格格式
  • 带小数的大数和小数 (=12.9999999999999 + 0.00000000000000001)
  • 超链接、条件格式规则
  • 数据范围内外的空格式
  • 任何其他可能导致数据问题的事情

推荐答案

我提供第一个答案作为参考

I'm providing the first answer as a reference

如果没有其他可用选项,其他人可能会发现它很有用

Others may find it useful, if there are no other options available

  • 实现结果的最快方法是不使用删除操作
  • 在 100 万条记录中,它平均在 33 秒内删除了 100,000 行

.

Sub DeleteRowsWithValuesNewSheet()  '100K records   10K to delete
                                    'Test 1:        2.40234375 sec
                                    'Test 2:        2.41796875 sec
                                    'Test 3:        2.40234375 sec
                                    '1M records     100K to delete
                                    'Test 1:        32.9140625 sec
                                    'Test 2:        33.1484375 sec
                                    'Test 3:        32.90625   sec
    Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long
    Dim wsName As String, t As Double, oldUsedRng As Range

    FastWB True:    t = Timer

    Set oldWs = Worksheets(1)
    wsName = oldWs.Name

    Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange))

    If oldUsedRng.Rows.Count > 1 Then                           'If sheet is not empty
        Set newWs = Sheets.Add(After:=oldWs)                    'Add new sheet
        With oldUsedRng
            .AutoFilter Field:=1, Criteria1:="<>Test String"
            .Copy                                               'Copy visible data
        End With
        With newWs.Cells
            .PasteSpecial xlPasteColumnWidths
            .PasteSpecial xlPasteAll                            'Paste data on new sheet
            .Cells(1, 1).Select                                 'Deselect paste area
            .Cells(1, 1).Copy                                   'Clear Clipboard
        End With
        oldWs.Delete                                            'Delete old sheet
        newWs.Name = wsName
    End If
    FastWB False:   InputBox "Duration: ", "Duration", Timer - t
End Sub

.

高层次:

  • 它创建一个新的工作表,并保留对初始工作表的引用
  • 自动过滤搜索文本的第 1 列:.AutoFilter Field:=1, Criteria1:="<>Test String"
  • 复制初始工作表中的所有(可见)数据
  • 将列宽、格式和数据粘贴到新工作表中
  • 删除初始工作表
  • 将新工作表重命名为旧工作表名称

它使用了问题中发布的相同帮助函数

It uses the same helper functions posted in the question

99% 的持续时间被自动过滤器使用

The 99% of the duration is used by the AutoFilter

.

到目前为止,我发现了一些限制,第一个可以解决:

There are a couple limitations I found so far, the first can be addressed:

  1. 如果初始工作表上有任何隐藏的行,它会取消隐藏它们

  1. If there are any hidden rows on the initial sheet, it unhides them

  • 需要一个单独的函数来隐藏它们
  • 根据实施情况,可能会显着增加持续时间

VBA 相关:

  • 它改变了工作表的代码名称;其他引用 Sheet1 的 VBA 将被破坏(如果有)
  • 它会删除与初始工作表关联的所有 VBA 代码(如果有)

.

关于使用像这样的大文件的一些注意事项:

A few notes about using large files like this:

  • 二进制格式 (.xlsb) 显着减小了文件大小(从 137 Mb 到 43 Mb)
  • 非托管条件格式规则可能导致指数性能问题

  • The binary format (.xlsb) reduce file size dramatically (from 137 Mb to 43 Mb)
  • Unmanaged Conditional Formatting rules can cause exponential performance issues

  • 评论和数据验证相同

从网络读取文件或数据比使用本地文件慢得多

Reading file or data from network is much slower than working with a locall file

这篇关于Excel VBA 性能 - 100 万行 - 删除包含值的行,不到 1 分钟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆