将word文档解析成excel文件 [英] Parse a word document into an excel file

查看:123
本文介绍了将word文档解析成excel文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个word文档,我想要解析成一个excel文件的数据。源文件数百页。我一直在使用VBA,但是我刚刚开始学习语言,并尝试输入一个.doc文件,遇到很多困难。我已经能够使用打开行输入语句从.txt文件中检索,但是当我尝试.doc文件时,只会乱七八糟。



我已经包括两个屏幕截图链接。



第一个是我的输入示例的屏幕截图数据。

http://img717.imageshack.us/i/input .jpg /



第二个是我想要的输出的截图。

http://img3.imageshack.us/i/outputg.jpg/



<我已经开发了一个我想要完成的算法。我只是编码困难。下面是我开发的伪代码。

 变量:
string line = blank
series_title = blank
folder_title = blank

int series_number = 0
box_number = 0
folder_number = 0
年= 0
做,而< end_of_document>尚未到达
输入行
如果行中的第一个字是系列
store< series_number>
将:之后的字符串存储到< series_title>
结束如果
调用parse_box(其余行)
输出< series_number> < SERIES_TITLE> < box_number> < folder_number>< folder_title> <&一年GT;
end do while

函数parse_box(当前行)
如果行中的第一个字是box
store< box_number>
结束如果
调用parse_folder(其余行)
结束函数

函数parse_folder(当前行)
如果第一个字是文件夹
store< folder_number>
结束如果
调用parse_folder_title(其余行)
结束函数

函数parse_folder_title_and_year(当前行)
string temp_folder_title
存储所有内容< temp_folder_title>直到行结尾
如果< temp_folder_title>中的最后一个字是一年
店<年>
end if
if< temp_folder_title>是空的/空的
//使用< folder_title>从
else
< folder_title>是< temp_folder_title>减去<年>
结束如果
结束parse_folder_title_and_year

提前感谢所有的帮助和建议

解决方案

fopen和输入命令通常只适用于纯文本文件(您可以在记事本中阅读的内容)。如果要从Microsoft Word文档中以编程方式读取,则必须将Microsoft Word 12.0对象库(或系统中的最新版本)添加到VBAProject引用中,并使用Word API打开并阅读该文档。 / p>

  Dim odoc As Word.Document 
设置odoc = oWrd.Documents.Open(文件名:= DocumentPath,Visible:= F $)

Dim singleLine As Paragraph
Dim lineText As String

对于每个singleLine在ActiveDocument.Paragraphs
lineText = singleLine.Range.Text
'做你必须做的
下一个单列

Word没有线的概念。您可以阅读文本范围,段落和句子。实验并找到最适合您的输入文本在可管理的块中。


I have a word document that has data that I would like to parse into an excel file. The source files are hundreds of pages long. I have been working with VBA, but I just started learning the language and have run into lots of difficulties with trying to input a .doc file. I have been able to use the Open and the Line Input statement to retrieve from a .txt file but only gibberish when I try the .doc file.

I have included two links of screen shots.

The first is a screenshot of a sample of my input data.
http://img717.imageshack.us/i/input.jpg/

The second is a screenshot of my desired output.
http://img3.imageshack.us/i/outputg.jpg/

I have developed an algorithm of what I want to accomplish. I am just having difficulties coding. Below is the pseudocode that I have developed.

    Variables:
         string     line = blank
         series_title = blank
         folder_title = blank

         int  series_number = 0
              box_number = 0
              folder_number = 0
              year = 0
    do while the <end_of_document> has not been reached
        input line
        If the first word in the line is "series" 
            store <series_number>
            store the string after ":"into the <series_title>
        end if
        call parse_box(rest of line)
        output < series_number > <series_title> < box_number > < folder_number ><folder_title> <year>
    end do while

    function parse_box(current line)
        If the first word in the line is "box" 
            store <box_number>
        end if
        call parse_folder(rest of line)
    end function

    function parse_folder(current line)
        If first word is "Folder"
            store <folder_number>
        end if
        call parse_folder_title(rest of line)
    end function

    function parse_folder_title_and_year(current line)
        string temp_folder_title
        store everything as <temp_folder_title> until end of line
        if last word in <temp_folder_title> is a year
            store <year>
        end if
        if < temp_folder_title> is empty/blank
            //use <folder_title> from before
        else
            <folder_title> is < temp_folder_title> minus <year>
        end if
    end parse_folder_title_and_year

Thanks ahead of time for all your help and suggestions

解决方案

fopen and input commands generally only work on plain text files (things you can read in Notepad). If you want to programatically read from Microsoft word documents, you'll have to add the Microsoft Word 12.0 Object Library (or most recent version on your system) to your VBAProject references, and use the Word API to open and read the document.

Dim odoc As Word.Document
Set odoc = oWrd.Documents.Open(Filename:=DocumentPath, Visible:=False)

Dim singleLine As Paragraph
Dim lineText As String

For Each singleLine In ActiveDocument.Paragraphs
    lineText = singleLine.Range.Text
    'Do what you've gotta do
Next singleLine

Word doesn't have a concept of "Lines". You can read text ranges, and paragraphs, and sentences. Experiment and find what works best for getting your input text in manageable blocks.

这篇关于将word文档解析成excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆