使用页码从Word doc到Excel中提取包含特定关键字的字符串值 [英] Extracting string values containing a certain key word from Word doc to Excel with the page number

查看:126
本文介绍了使用页码从Word doc到Excel中提取包含特定关键字的字符串值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是VBA的新手,正在尝试从Word文档中提取一些包含某个关键字的字符串值到Excel.例如,有一些国家代码,例如USA.001.01.033592,我想从单词doc中提取所有看起来像国家代码的字符串值,然后将它们收集到Excel电子表格中.

I am new to VBA and trying to extract some string values that contain a certain key word from a Word document to Excel. For example, there are country codes such as USA.001.01.033592 and I want to extract all the string values look like the country code from the word doc and collect them into a Excel spreadsheet.

我正在查看的国家/地区代码的格式

The country codes I'm looking at as a format of

  1. USA.xxx.xx.xxxxxx
  2. JPA.xxx.xx.xxxxxx
  3. FRA.xxx.xx.xxxxxxX代表数字,问题在于这些代码位于正文段落,段落内的表格和脚注中.另外,当我检索代码时,我也想提取页码.

有什么方法可以一次从主要段落,表格和脚注中提取所需的数据以及它们的页码?

Is there any way I can extract the data I want from the main paragraphs, tables, and footnotes at once with their page number?

我的代码草稿很粗,但根本没有用.谁能帮忙吗?

I have a rough draft of my code but it's not working at all. Could anyone please help?

这是我的代码:

Option Explicit

Sub Footnotes()

    Dim appExcel As Object
    Dim objSheet As Object
    Dim aRange As Range
    Dim intRowCount As Integer
    intRowCount = 1
    Set aRange = ActiveDocument.Range

    With aRange.Find
    Do
        .Text = "USA." or "JPA." or "FRA."
        .Execute
        If .Found Then
            aRange.Expand Unit:=wdSentence
            aRange.Copy
            aRange.Collapse wdCollapseEnd
            If objSheet Is Nothing Then
               Set appExcel = CreateObject("Excel.Application")
               Set objSheet = appExcel.workbooks.Open("C:\Users\Footnotes.xlsx").Sheets("Sheet1")
               intRowCount = 1
            End If
            objSheet.Cells(intRowCount, 1).Select
            objSheet.Paste
            intRowCount = intRowCount + 1
        End If
        Loop While .Found
    End With

    If Not objSheet Is Nothing Then
    appExcel.workbooks(1).Close True
    appExcel.Quit
    Set objSheet = Nothing
    Set appExcel = Nothing
    End If
    Set aRange = Nothing

End Sub

当前,我正在尝试从Word文档中使用VBA,但如果最好从excel文件开始,请告诉我.

currently, I'm trying to use VBA from the word document but if it's better to start from the excel file please let me know.

推荐答案

由于您有不同的搜索模式,因此可以将Word的 Find 与通配符一起使用.在Word MVP网站上有一个很好的参考书.这将使 Find 可以返回您要查找的整个字符串,而无需扩展找到的范围.

As you have a distinct pattern to search for you can use Word's Find with wildcards. There is a good reference for that at the Word MVP website. This will enable Find to return the entire string that you are looking for without needing to extend the found range.

一旦找到了范围,就可以使用 Information 属性检索文本以传递到Excel并获取页码.

Once you have the found range you can then retrieve the text to pass to Excel and obtain the page number using the Information property.

Word文档由称为 StoryRanges .尽管表格只是包含它们的范围的一部分,但脚注包含在单独的StoryRange中.下面的代码遍历StoryRanges并检查 type type 当前是.我这样做是为了让您可以根据需要添加其他类型.

A Word document consists of a number of parts referred to as StoryRanges. Whilst tables are just part of the range that contains them, footnotes are contained in a separate StoryRange. The code below loops through the StoryRanges and checks which type the current one is. I have done it this way so that you can add other types if you need to.

您的问题中没有提到要使用页码做什么,因此您需要为此修改以下代码.

It wasn't stated in your question what you wanted to do with the page number so you'll need to modify the code below for that.

Sub Footnotes()

   Dim appExcel As Excel.Application
   Dim objSheet As Excel.Worksheet
   Dim findRange As Range
   Dim intRowCount As Integer
   Dim pageNum As Long
   
   If objSheet Is Nothing Then
      Set appExcel = CreateObject("Excel.Application")
      Set objSheet = appExcel.workbooks.Open("C:\Users\Footnotes.xlsx").Sheets("Sheet1")
   End If
   intRowCount = 2
   
   'Set findRange = ActiveDocument.Range
   For Each findRange In ActiveDocument.StoryRanges
      With findRange.Find
         .Text = "[UJF][PRS]A.[0-9]{3}.[0-9]{2}.[0-9]{6}"
         .MatchWildcards = True
         Do While .Execute = True
            pageNum = CLng(findRange.Information(wdActiveEndPageNumber))
            objSheet.Cells(intRowCount, 1).Value = findRange.Text
            objSheet.Cells(intRowCount, 2).Value = pageNum
            intRowCount = intRowCount + 1
            findRange.Collapse wdCollapseEnd
         Loop
      End With
   Next findRange

   If Not objSheet Is Nothing Then
      appExcel.workbooks(1).Close True
      appExcel.Quit
      Set objSheet = Nothing
      Set appExcel = Nothing
   End If
   Set findRange = Nothing
End Sub

上面的代码只能找到问题中列出的国家/地区代码.要查找任何国家/地区代码,请将 Find.Text 更改为"[AZ] {3}.[0-9] {3}.[0-9] {2}.[0-9] {6}"

The code above only finds the country codes listed in the question. To find any country code change Find.Text to "[A-Z]{3}.[0-9]{3}.[0-9]{2}.[0-9]{6}"

这篇关于使用页码从Word doc到Excel中提取包含特定关键字的字符串值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆