如何用ruby解析word文档? [英] How to parse word documents with ruby?

查看:243
本文介绍了如何用ruby解析word文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道我可以在OS X/Linux上使用的库来解析Word文件并将内容输出为HTML吗?

Does anyone know of a library that I can use on OS X/Linux to parse Word files and output the content as HTML?

我看过win32ole,但据我所知,它仅适用于Windows,尽管我可能是错的.

I've had a look at win32ole but as far as I can see it's for Windows only, although I could be wrong.

有什么建议吗?

推荐答案

Word文档格式(暂时忽略docx)非常糟糕,并且一直在不断变化.恕我直言,这就是为什么很少有(读取为零)Ruby库来解析它们的原因.

The Word document format (ignoring docx for the moment) is terrible and was constantly changing. IMHO that is why there are so few (read: zero) Ruby libraries out there to parse them.

我建议做的是使用JRuby和一些已建立的Java库来读取doc格式. Google应该在那里为您提供帮助: http://schmidt.devlib.org/java/libraries- word.html .

What I recommend doing is using JRuby and some of the established Java libraries for reading the doc format. Google should help you out there: http://schmidt.devlib.org/java/libraries-word.html.

有一个Java项目,用于读取MIcrosoft文件格式POI( http://poi.apache.org/),并且它们确实具有Ruby绑定( http://poi.apache.org/poi-ruby .html ),但我不确定它们是最新的.在他们的网站上说,Ruby绑定适用于1.8.2 ...

There is a Java project for reading MIcrosoft file formats, POI (http://poi.apache.org/) and they do have Ruby bindings (http://poi.apache.org/poi-ruby.html) but I'm not sure how up-to-date those are. On their site it says the Ruby bindings are for 1.8.2...

这篇关于如何用ruby解析word文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆