使用页码从Word doc到Excel中提取包含特定关键字的字符串值 [英] Extracting string values containing a certain key word from Word doc to Excel with the page number
问题描述
我是VBA的新手,正在尝试从Word文档中提取一些包含某个关键字的字符串值到Excel.例如,有一些国家代码,例如USA.001.01.033592,我想从单词doc中提取所有看起来像国家代码的字符串值,然后将它们收集到Excel电子表格中.
I am new to VBA and trying to extract some string values that contain a certain key word from a Word document to Excel. For example, there are country codes such as USA.001.01.033592 and I want to extract all the string values look like the country code from the word doc and collect them into a Excel spreadsheet.
我正在查看的国家/地区代码的格式
The country codes I'm looking at as a format of
- USA.xxx.xx.xxxxxx
- JPA.xxx.xx.xxxxxx
- FRA.xxx.xx.xxxxxxX代表数字,问题在于这些代码位于正文段落,段落内的表格和脚注中.另外,当我检索代码时,我也想提取页码.
有什么方法可以一次从主要段落,表格和脚注中提取所需的数据以及它们的页码?
Is there any way I can extract the data I want from the main paragraphs, tables, and footnotes at once with their page number?
我的代码草稿很粗,但根本没有用.谁能帮忙吗?
I have a rough draft of my code but it's not working at all. Could anyone please help?
这是我的代码:
Option Explicit
Sub Footnotes()
Dim appExcel As Object
Dim objSheet As Object
Dim aRange As Range
Dim intRowCount As Integer
intRowCount = 1
Set aRange = ActiveDocument.Range
With aRange.Find
Do
.Text = "USA." or "JPA." or "FRA."
.Execute
If .Found Then
aRange.Expand Unit:=wdSentence
aRange.Copy
aRange.Collapse wdCollapseEnd
If objSheet Is Nothing Then
Set appExcel = CreateObject("Excel.Application")
Set objSheet = appExcel.workbooks.Open("C:\Users\Footnotes.xlsx").Sheets("Sheet1")
intRowCount = 1
End If
objSheet.Cells(intRowCount, 1).Select
objSheet.Paste
intRowCount = intRowCount + 1
End If
Loop While .Found
End With
If Not objSheet Is Nothing Then
appExcel.workbooks(1).Close True
appExcel.Quit
Set objSheet = Nothing
Set appExcel = Nothing
End If
Set aRange = Nothing
End Sub
当前,我正在尝试从Word文档中使用VBA,但如果最好从excel文件开始,请告诉我.
currently, I'm trying to use VBA from the word document but if it's better to start from the excel file please let me know.
推荐答案
由于您有不同的搜索模式,因此可以将Word的 Find
与通配符一起使用.在Word MVP网站上有一个很好的参考书.这将使 Find
可以返回您要查找的整个字符串,而无需扩展找到的范围.
As you have a distinct pattern to search for you can use Word's Find
with wildcards. There is a good reference for that at the Word MVP website. This will enable Find
to return the entire string that you are looking for without needing to extend the found range.
一旦找到了范围,就可以使用 Information
属性检索文本以传递到Excel并获取页码.
Once you have the found range you can then retrieve the text to pass to Excel and obtain the page number using the Information
property.
Word文档由称为 StoryRanges
.尽管表格只是包含它们的范围的一部分,但脚注包含在单独的StoryRange中.下面的代码遍历StoryRanges并检查 type type 当前是.我这样做是为了让您可以根据需要添加其他类型.
A Word document consists of a number of parts referred to as StoryRanges
. Whilst tables are just part of the range that contains them, footnotes are contained in a separate StoryRange. The code below loops through the StoryRanges and checks which type the current one is. I have done it this way so that you can add other types if you need to.
您的问题中没有提到要使用页码做什么,因此您需要为此修改以下代码.
It wasn't stated in your question what you wanted to do with the page number so you'll need to modify the code below for that.
Sub Footnotes()
Dim appExcel As Excel.Application
Dim objSheet As Excel.Worksheet
Dim findRange As Range
Dim intRowCount As Integer
Dim pageNum As Long
If objSheet Is Nothing Then
Set appExcel = CreateObject("Excel.Application")
Set objSheet = appExcel.workbooks.Open("C:\Users\Footnotes.xlsx").Sheets("Sheet1")
End If
intRowCount = 2
'Set findRange = ActiveDocument.Range
For Each findRange In ActiveDocument.StoryRanges
With findRange.Find
.Text = "[UJF][PRS]A.[0-9]{3}.[0-9]{2}.[0-9]{6}"
.MatchWildcards = True
Do While .Execute = True
pageNum = CLng(findRange.Information(wdActiveEndPageNumber))
objSheet.Cells(intRowCount, 1).Value = findRange.Text
objSheet.Cells(intRowCount, 2).Value = pageNum
intRowCount = intRowCount + 1
findRange.Collapse wdCollapseEnd
Loop
End With
Next findRange
If Not objSheet Is Nothing Then
appExcel.workbooks(1).Close True
appExcel.Quit
Set objSheet = Nothing
Set appExcel = Nothing
End If
Set findRange = Nothing
End Sub
上面的代码只能找到问题中列出的国家/地区代码.要查找任何国家/地区代码,请将 Find.Text
更改为"[AZ] {3}.[0-9] {3}.[0-9] {2}.[0-9] {6}"
The code above only finds the country codes listed in the question. To find any country code change Find.Text
to "[A-Z]{3}.[0-9]{3}.[0-9]{2}.[0-9]{6}"
这篇关于使用页码从Word doc到Excel中提取包含特定关键字的字符串值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!