将单词(.docx)转换为docbook [英] Convert word (.docx) to docbook

查看:159
本文介绍了将单词(.docx)转换为docbook的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的任务是找到一种方法将大量的.docx文件转换为docbook 5.目前,我们在openoffice中打开文件并保存到docbook。这是一项耗时的任务,但我相信有更好的方法。然后,这些文件将进一步处理到我们的自定义Relax NG模式。因此,这种转换不需要完美无缺。我环顾四周,并将继续调查一些线索,但没有找到任何有用的东西。

I have been tasked to find a way to convert a large amount of .docx files to docbook 5. Currently, we open the file in openoffice and save to docbook. This is a time consuming task, but I am confident there is a better way. These files will then be processed further to our custom relax NG schema. Therefore this conversion does not need to be flawless. I have looked around, and will continue to investigate some leads, but have not found anything usefull.

查看转换doc / docx到语义HTML 他们建议 upCast ,但这似乎不适合我的需求。

looking at Convert doc/docx to semantic HTML they have suggested upCast, but this does not seem appropriate to my needs.

我正在寻找可以从命令行使用的免费服务。我最终想批量处理我们的文件。我已经包含了linux,python和java标签,这些是我最舒服的环境,但愿意为正确的解决方案而努力。在我出去重新发明轮子之前,我正在尝试做一些研究。

I am looking for something freely available that I can use from the command line. I ultimately I would like to batch process our files. I have included the linux, python, and java tags for these are the environments I am most comfortable, but would be willing to bend for the right solution. I am trying to do some research before I go out and reinvent the wheel.

推荐答案

有几种方法来编写这个,在OpenOffice中使用外部脚本和脚本。有关示例,请参阅以下链接:

There are several ways to script this, both using external scripts and scripts within OpenOffice. See the following links for some examples:

  • http://juretta.com/log/2006/08/10/convert_microsoft_word_to_docbook_xml_using_ruby_and_openoffice/
  • http://www.oooninja.com/2008/02/batch-command-line-file-conversion-with.html
  • http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openoffice.html
  • http://mail.python.org/pipermail/python-announce-list/2006-May/004951.html
  • http://dag.wieers.com/home-made/unoconv/

上面的一些链接不使用Java或Python,但原则仍然适用且脚本通常很短,可以移植(第一个例子是Ruby,但由于简单,这是我个人的最爱)。

Some of the above links aren't using Java or Python, but the principles still apply and the scripts are typically short enough that they can be ported (the first example is in Ruby, but it's my personal favorite due to the simplicity).

这篇关于将单词(.docx)转换为docbook的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆