将doc / docx转换为语义HTML [英] Convert doc/docx to semantic HTML

查看:147
本文介绍了将doc / docx转换为语义HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



有些愿望/要求:

    >
  1. 语义HTML,使得文档中的标题是< h1>,< h2>等等,表格是< table>等等。


  2. 最好能够处理标题,列表,表格和图像。图形和数学公式是一个很好的补充。




  3. •不必直接从doc / docx转换为html,可以使用中间格式,例如xml或docbook。



    •应该以编程方式工作,并且包含大量文档。

    到目前为止我找到的解决方案最接近的是 http:/ /holloway.co.nz/docvert/index.html ,但不幸的是,有许多错误,小用户群,它不能处理大量文件。更多的是一个概念证明。

    解决方案

    有一个名为 upCast ,它能够将Word文档转换为XML。

    I would like to convert doc/docx documents to semantic HTML.

    Some wishes/requirements:

    1. Semantic HTML such that headers in the document are <h1>, <h2> etc., tables are <table> and so forth.

    2. Should preferably be possible to handle headings, lists, tables and images. Graphs and math formulas is a nice extra.

    • Doesn't have to be converted straight from doc/docx to html, could use an intermediary format, such as xml or docbook.

    • Should work programatically, and with large number of documents.

    The closest thing to a solution I've found so far is http://holloway.co.nz/docvert/index.html, but unfortunately there are many a few bugs, small user base and it can't handle a lot of documents. More of a proof of concept.

    解决方案

    There's a tool called upCast which is able to convert Word documents into XML.

    这篇关于将doc / docx转换为语义HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆