使用ruby和win32ole读取MS Word .doc文件 [英] Read MS Word .doc file with ruby and win32ole

查看:142
本文介绍了使用ruby和win32ole读取MS Word .doc文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用ruby读取.doc文件,我使用win32ole库.

I'm trying to read .doc file with ruby, I use win32ole library.

输入我的代码



require 'win32ole'

class DocParser

  def initialize
    @content = ''
  end

  def read_file file_path
    begin
      word = WIN32OLE.connect( 'Word.Application' )
      doc  = word.activedocument
    rescue
      word = WIN32OLE.new( 'Word.Application' )
      doc  = word.documents.open( file_path )
    end
    word.visible = false
    doc.sentences.each{ |x| @content = @content + x.text }

    word.quit
    @content
  end
end

我用 DocParser.new.read_file('path/file.doc')

当我使用rails c运行此程序时-我没有任何问题,它工作正常. 但是,当我使用导轨运行它时(例如,单击按钮后),偶尔(每3-4次)此代码会崩溃,并报错:

When I run this using rails c - I don't have any problems, it's working fine. But when I run it using rails (e.g. after button click), once in a while (every 3-4 time) this code crashes with error:



WIN32OLERuntimeError (failed to create WIN32OLE object from `Word.Application'
    HRESULT error code:0x800401f0
      CoInitialize has not been called.):
  lib/file_parsers/doc_parser.rb:14:in `initialize'
  lib/file_parsers/doc_parser.rb:14:in `new'
  lib/file_parsers/doc_parser.rb:14:in `rescue in read_file'
  lib/file_parsers/doc_parser.rb:10:in `read_file'
  lib/search_engine.rb:10:in `block in search'
  lib/search_engine.rb:43:in `block in each_file_in'
  lib/search_engine.rb:42:in `each_file_in'
  lib/search_engine.rb:8:in `search'
  app/controllers/home_controller.rb:9:in `search'


  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_source.erb (0.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_trace.text.erb (2.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_request_and_response.text.erb (2.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/diagnostics.erb (56.0ms)

通常,此代码成功读取了doc文件,但几秒钟后出现了RAILS崩溃的情况: 看看这个要点

Aditionaly, this code read doc file successfully, but RAILS CRASHES AFTER A FEW SECONDS: look at this gist

我的问题是什么?我该如何解决? 请帮忙!

What is my problem? How can I fix it? Please, help!

推荐答案

不知道rails c和rails之间的区别,所以我会给出一些随机建议.

Don't know the difference between rails c and rails, so I'll give some random advise.

首先,每次在服务器上运行Word时,都在Web服务器中运行它是一个坏主意,那么如果多个用户同时开始使用它会发生什么呢?

First, it is a bad idea to run this in a webserver, each time Word is run on the server, so what happens if multiple users start using this at the same time ?

您最好先将.doc文件转换为其他格式,例如.rtf或.docx(批量转换?),然后再使用其他不需要Word本身的gem.

You'd better convert your .doc files to another format first like .rtf or .docx (a batch conversion ?) and then use other gems that don't require Word itself.

如果保持这种状态,请考虑不要关闭单词(删除word.quit),仅关闭文档本身,下一次实例将由WIN32OLE.connect

If you keep it like this, consider to not close word (remove the word.quit) buit only close the document itself, the instance will be picked up the next time by the WIN32OLE.connect

在进行测试时,最好使单词保持可见,以便更好地了解正在发生的事情(错误?). 我注意到您的路径使用正斜杠,但在这种情况下需要反斜杠,但是由于您的代码在错误发生之前运行了几次,所以我认为这不是问题.

While testing you'de better keep word visible so that you can better see what is happening (errors ?). I notice your path uses forward slashes while in this case backslashes are needed but since your code runs a few times before the error i suppose that is not the problem.

希望这会有所帮助.

这篇关于使用ruby和win32ole读取MS Word .doc文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆