如何读取DOCX文档元数据信息? [英] How to read metadata information from docx documents?

查看:826
本文介绍了如何读取DOCX文档元数据信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要实现的是有一个word文档模板(DOCX),其中将包含标题,作者姓名,日期等。

这个模板,然后将用于用户来完成它。我需要创建一个C#程序,将采取在的docx文件和读取所有感兴趣的信息(标题,姓名,日期,..)。

所以,我的问题是:

  1. 我如何把元数据到模板中说:这是标题,这是日期,这是名称等? (未编程)

  2. 如何编程阅读这些信息?

解决方案

我没有看过的OpenXML在一段时间,所以也许有人可以跳,更描述了一下。

接近这一点的一种方法是使用内容控件。在办公室,你可以创建你的模板,然后为每个各自感兴趣的输入,你可以将这些控件之一。他们在办公室的开发人员选项卡下。

在插入您的控件,您需要为他们每个人有唯一的名称。办事处将让他们都有着相同的名字,但你需要唯一标识的模板文档所有的人。

您现在需要获得的输入到这些控制数据。再次,有可能是一些更好的解决方案,但埃里克·怀特拥有各类大OpenXML的东西,所以这里是他的一个:的遍历内容控制

我觉得有问题,找到嵌套表格中的内容控制。所以,如果你这样做,那么我认为你必须明确遍历表的元素中找到的内容控制。

另外,你可能会想保存的.docx从.doct文件,我不认为有OpenXML中任何内置的一班轮的方法;但是,您可以创建一个新的Word文档,然后写模板的文件流到新创建的docx文件。再次,当然,有可能是更好的解决方案在那里。

你来过这里?有很多的好东西: 介绍OpenXML的

此外,埃里克已经发布的 OpenXML的YouTube频道越来越多的视频

what I need to achieve is to have a word document template(docx), which will contain Title, Author name, Date, etc.

This template then will be used by users to complete it. I need to create a c# program, that will take in the docx file and read all the information of interest(title, name, date, ..).

So my questions are:

  1. How do I put the metadata into the template saying: this is Title, this is Date, this is Name, etc? (not programatically)

  2. How do I programmatically read that information?

解决方案

I haven't looked at OpenXML in awhile, so maybe someone can jump and be a bit more descriptive.

One way to approach this would be to use Content Controls. In Office, you can create your template, and then for each of your respective inputs of interest you can place one of these controls. They're under the Developer tab in Office.

After inserting your controls you'll need for each of them to have a unique name. Office will let them all have the same name, but you'll need to uniquely identify all of them in your template document.

You now need to get the data that's input in to these controls. Again, there's likely to be some better solutions but Eric White has all kinds of great OpenXML stuff, and so here's one of his: Iterating over Content Controls

I think there's problems with finding content controls nested within a table. So, if you do that, then I think you have to specifically loop over the elements of the table to find content controls within.

Also, you're probably going to want to save a .docx from your .doct file, which I don't think there's any built-in "one-liner" method in OpenXML; however, you can create a new Word document, and then write the file stream of the template in to the newly created docx file. Again, of course, there may be better solutions out there.

Have you been here? There's lots of good stuff: Introduction to OpenXML

Additionally, Eric has been releasing more and more videos on the OpenXML YouTube channel

这篇关于如何读取DOCX文档元数据信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆