如何使用Office Interop API枚举Word文档? [英] How to enumerate word document using office interop API?

查看:104
本文介绍了如何使用Office Interop API枚举Word文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想逐个遍历单词文档的所有元素,并根据元素的类型(标题,句子,表格,图像,文本框,形状等)进行处理.我尝试搜索可以代表office interop API中文档元素的任何枚举器或对象,但未找到任何枚举器或对象. API提供句子,段落,形状集合,但不提供可以指向下一个元素的通用对象. 例如:

I want to traverse through all the elements of an word document one by one and according to type of element (header, sentence, table,image,textbox, shape, etc.) I want to process that element. I tried to search any enumerator or object which can represent elements of document in office interop API but failed to find any. API offers sentences, paragraphs, shapes collections but doesnt provide generic object which can point to next element. For example :

<header of document>
<plain text sentences>
<table with many rows,columns>
<text box>
<image>
<footer>

(请将其想象为Word文档)

(Please imagine it as a word document)

所以,现在我想要一些枚举器,它首先会给我<header of document>,然后在下一次迭代时给我<plain text sentences>,然后是<table with many rows,columns>,依此类推. 有谁知道我们如何做到这一点?有可能吗?

So, now I want some enumerator which will first give me <header of document>, then on next iteration give me <plain text sentences>, then <table with many rows,columns> and so on. Does anyone knows how we can achieve this? Is it possible?

我正在使用C#,Visual Studio 2005和Word 2003.

I am using C#, visual studio 2005 and Word 2003.

非常感谢

推荐答案

之所以没有简单的迭代器,是因为Word文档比问题中概述的简单结构复杂得多.

The reason that you don't have a simple iterator is that Word documents can be far more complex than the simple structure outlined in your question.

例如,文档可能在第一页,偶数页和奇数页具有多个页眉和页脚,包含多个具有不同页眉和页脚设置的部分,包含脚注,注释和修订以及诸如表之类的对象,文本框,图像和形状可能会与文本内联显示或浮动显示.简而言之,没有固定的元素顺序.

For example, a document may have multiple headers and footers for the first page as well as even and odd pages, contains more than one section with different header and footer setup, contain footnotes, comments and revisions, and objects such as tables, text boxes, images and shapes may appear inline with text or floating. In short, there is no fix sequence of elements.

您将必须检查输入文档的复杂程度,并根据分析结果决定如何遍历段落以及附加的图像和形状等.

You would have to check how complex your input documents are and based on the result of that analysis decide how to iterate over paragraphs and attached images and shapes etc.

这篇关于如何使用Office Interop API枚举Word文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆