将word文档解析成excel文件 [英] Parse a word document into an excel file
问题描述
我有一个word文档,我想要解析成一个excel文件的数据。源文件数百页。我一直在使用VBA,但是我刚刚开始学习语言,并尝试输入一个.doc文件,遇到很多困难。我已经能够使用打开和行输入语句从.txt文件中检索,但是当我尝试.doc文件时,只会乱七八糟。
我已经包括两个屏幕截图链接。
第一个是我的输入示例的屏幕截图数据。
http://img717.imageshack.us/i/input .jpg /
第二个是我想要的输出的截图。
http://img3.imageshack.us/i/outputg.jpg/
<我已经开发了一个我想要完成的算法。我只是编码困难。下面是我开发的伪代码。
变量:
string line = blank
series_title = blank
folder_title = blank
int series_number = 0
box_number = 0
folder_number = 0
年= 0
做,而< end_of_document>尚未到达
输入行
如果行中的第一个字是系列
store< series_number>
将:之后的字符串存储到< series_title>
结束如果
调用parse_box(其余行)
输出< series_number> < SERIES_TITLE> < box_number> < folder_number>< folder_title> <&一年GT;
end do while
函数parse_box(当前行)
如果行中的第一个字是box
store< box_number>
结束如果
调用parse_folder(其余行)
结束函数
函数parse_folder(当前行)
如果第一个字是文件夹
store< folder_number>
结束如果
调用parse_folder_title(其余行)
结束函数
函数parse_folder_title_and_year(当前行)
string temp_folder_title
存储所有内容< temp_folder_title>直到行结尾
如果< temp_folder_title>中的最后一个字是一年
店<年>
end if
if< temp_folder_title>是空的/空的
//使用< folder_title>从
else
< folder_title>是< temp_folder_title>减去<年>
结束如果
结束parse_folder_title_and_year
提前感谢所有的帮助和建议
fopen和输入命令通常只适用于纯文本文件(您可以在记事本中阅读的内容)。如果要从Microsoft Word文档中以编程方式读取,则必须将Microsoft Word 12.0对象库(或系统中的最新版本)添加到VBAProject引用中,并使用Word API打开并阅读该文档。 / p>
Dim odoc As Word.Document
设置odoc = oWrd.Documents.Open(文件名:= DocumentPath,Visible:= F $)
Dim singleLine As Paragraph
Dim lineText As String
对于每个singleLine在ActiveDocument.Paragraphs
lineText = singleLine.Range.Text
'做你必须做的
下一个单列
Word没有线的概念。您可以阅读文本范围,段落和句子。实验并找到最适合您的输入文本在可管理的块中。
I have a word document that has data that I would like to parse into an excel file. The source files are hundreds of pages long. I have been working with VBA, but I just started learning the language and have run into lots of difficulties with trying to input a .doc file. I have been able to use the Open and the Line Input statement to retrieve from a .txt file but only gibberish when I try the .doc file.
I have included two links of screen shots.
The first is a screenshot of a sample of my input data.
http://img717.imageshack.us/i/input.jpg/
The second is a screenshot of my desired output.
http://img3.imageshack.us/i/outputg.jpg/
I have developed an algorithm of what I want to accomplish. I am just having difficulties coding. Below is the pseudocode that I have developed.
Variables:
string line = blank
series_title = blank
folder_title = blank
int series_number = 0
box_number = 0
folder_number = 0
year = 0
do while the <end_of_document> has not been reached
input line
If the first word in the line is "series"
store <series_number>
store the string after ":"into the <series_title>
end if
call parse_box(rest of line)
output < series_number > <series_title> < box_number > < folder_number ><folder_title> <year>
end do while
function parse_box(current line)
If the first word in the line is "box"
store <box_number>
end if
call parse_folder(rest of line)
end function
function parse_folder(current line)
If first word is "Folder"
store <folder_number>
end if
call parse_folder_title(rest of line)
end function
function parse_folder_title_and_year(current line)
string temp_folder_title
store everything as <temp_folder_title> until end of line
if last word in <temp_folder_title> is a year
store <year>
end if
if < temp_folder_title> is empty/blank
//use <folder_title> from before
else
<folder_title> is < temp_folder_title> minus <year>
end if
end parse_folder_title_and_year
Thanks ahead of time for all your help and suggestions
fopen and input commands generally only work on plain text files (things you can read in Notepad). If you want to programatically read from Microsoft word documents, you'll have to add the Microsoft Word 12.0 Object Library (or most recent version on your system) to your VBAProject references, and use the Word API to open and read the document.
Dim odoc As Word.Document
Set odoc = oWrd.Documents.Open(Filename:=DocumentPath, Visible:=False)
Dim singleLine As Paragraph
Dim lineText As String
For Each singleLine In ActiveDocument.Paragraphs
lineText = singleLine.Range.Text
'Do what you've gotta do
Next singleLine
Word doesn't have a concept of "Lines". You can read text ranges, and paragraphs, and sentences. Experiment and find what works best for getting your input text in manageable blocks.
这篇关于将word文档解析成excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!