我们如何在 XSLT 中将 Microsoft Word DOCX 文件转换为 HTML? [英] How do we convert a Microsoft Word DOCX file to HTML in XSLT?

查看:24
本文介绍了我们如何在 XSLT 中将 Microsoft Word DOCX 文件转换为 HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于将 Word DOCX XML (OOXML) 文件转换为 HTML 格式的项目.

I have project about transforming Word DOCX XML (OOXML) files to HTML format.

我使用 XML Spy 和 XSLT、XPath、XML 进行此转换.

I use XML Spy and XSLT, XPath, XML for this transformation.

想象一下我用 XSLT 编写程序并转换它的单个 Word 文件.但我的主管说,如果我更改文件中的值,该方法将不起作用.

Imagine a single Word file that I write a program in XSLT and transform it. But my supervisor says that if i change a value in the file that approach won't work.

我同意这一点,因为我只为该文档指定了代码,因为我知道其中包含的内容.

I agree with that because I specify the code just for that document because I know what contains in it.

但是,我们如何在 XSLT 中编写通用代码来将所有 Word 文件转换为格式良好的 HTML 文档(因为 Word 文档可能彼此有很大不同)?

But, how do we write a general code in XSLT to transform all the Word files as well-formed HTML document (since a word document can be a lot different than each other)?

问题是我想用 XSLT 来做吗?这里有什么不对吗?或者我只是对此太混乱了.

The problem is that I am trying to do it with XSLT? Is something wrong here isn't there? Or am i just being so chaotic about that.

推荐答案

您使用 XSLT 将 DOCX 文件转换为 HTML 的计划从根本上是合理的.XSLT 非常适合此目的,因为它非常适合从 XML 映射到 XML(或 (X)HTML).

Your plan to use XSLT to transform DOCX files to HTML is fundamentally sound. XSLT is ideal for this purpose as it is well suited for mapping from XML to XML (or (X)HTML).

您面临的挑战是基于 DOCX 的 XML 很复杂.Ecma Office Open XML 第 1 部分 - 基础知识和标记语言参考仅此一项就超过 5K 页.如果您非常了解 XML、XML 名称空间、XSLT、HTML 和 CSS,那么您将只是"必须学习 OOXML 的一些基础知识才能开始.

Your challenge will be that the XML underlying DOCX is complex. Ecma Office Open XML Part 1 - Fundamentals And Markup Language Reference alone is over 5K pages long. If you know XML, XML namespaces, XSLT, HTML, and CSS well, you'll "just" have to learn some basics of OOXML to get started.

如果您从根本上了解 OOXML,那么您不必担心更改值.从段落中文本运行的概念开始:w:tw:rw:p.

The concern about changing a value won't matter if you do this robustly and fundamentally understand OOXML. Start with the notion of runs of text in paragraphs: w:t, w:r and w:p.

Eric White 已经写了大量关于 OOXML 的文章,甚至专门将其转换为 HTML.请参阅 将 Open XML WordprocessingML 转换为 XHtml 以获取优秀的文章和示例.

Eric White has written extensively on OOXML in general and even transforming it to HTML specifically. See Transforming Open XML WordprocessingML to XHtml for excellent articles and examples.

这篇关于我们如何在 XSLT 中将 Microsoft Word DOCX 文件转换为 HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆