如何使用nokogiri和rubyzip编辑docx [英] How to edit docx with nokogiri and rubyzip

查看:91
本文介绍了如何使用nokogiri和rubyzip编辑docx的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用rubyzip和nokogiri的组合来编辑.docx文件.我正在使用rubyzip解压缩.docx文件,然后使用nokogiri解析并更改了word/document.xml文件的主体,但是每次我关闭rubyzip时,它都会损坏文件,因此无法打开它,或者修理它.我将.docx文件解压缩到桌面上,并检查word/document.xml文件,然后将内容更新为我将其更改为的内容,但所有其他文件都被弄乱了.有人可以帮我解决这个问题吗?这是我的代码:

I'm using a combination of rubyzip and nokogiri to edit a .docx file. I'm using rubyzip to unzip the .docx file and then using nokogiri to parse and change the body of the word/document.xml file but ever time I close rubyzip at the end it corrupts the file and I can't open it or repair it. I unzip the .docx file on desktop and check the word/document.xml file and the content is updated to what I changed it to but all the other files are messed up. Could someone help me with this issue? Here is my code:

require 'rubygems'  
require 'zip/zip'  
require 'nokogiri'  
zip = Zip::ZipFile.open("test.docx")  
doc = zip.find_entry("word/document.xml")  
xml = Nokogiri::XML.parse(doc.get_input_stream)  
wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first  
wt.content = "New Text"  
zip.get_output_stream("word/document.xml") {|f| f << xml.to_s}  
zip.close

推荐答案

昨晚我在rubyzip中遇到了同样的损坏问题.我通过将所有内容复制到新的zip文件并根据需要替换文件来解决了该问题.

I ran into the same corruption problem with rubyzip last night. I solved it by copying everything to a new zip file, replacing files as necessary.

这是我工作的概念证明:

Here's my working proof of concept:

#!/usr/bin/env ruby

require 'rubygems'
require 'zip/zip' # rubyzip gem
require 'nokogiri'

class WordXmlFile
  def self.open(path, &block)
    self.new(path, &block)
  end

  def initialize(path, &block)
    @replace = {}
    if block_given?
      @zip = Zip::ZipFile.open(path)
      yield(self)
      @zip.close
    else
      @zip = Zip::ZipFile.open(path)
    end
  end

  def merge(rec)
    xml = @zip.read("word/document.xml")
    doc = Nokogiri::XML(xml) {|x| x.noent}
    (doc/"//w:fldSimple").each do |field|
      if field.attributes['instr'].value =~ /MERGEFIELD (\S+)/
        text_node = (field/".//w:t").first
        if text_node
          text_node.inner_html = rec[$1].to_s
        else
          puts "No text node for #{$1}"
        end
      end
    end
    @replace["word/document.xml"] = doc.serialize :save_with => 0
  end

  def save(path)
    Zip::ZipFile.open(path, Zip::ZipFile::CREATE) do |out|
      @zip.each do |entry|
        out.get_output_stream(entry.name) do |o|
          if @replace[entry.name]
            o.write(@replace[entry.name])
          else
            o.write(@zip.read(entry.name))
          end
        end
      end
    end
    @zip.close
  end
end

if __FILE__ == $0
  file = ARGV[0]
  out_file = ARGV[1] || file.sub(/\.docx/, ' Merged.docx')
  w = WordXmlFile.open(file) 
  w.force_settings
  w.merge('First_Name' => 'Eric', 'Last_Name' => 'Mason')
  w.save(out_file)
end

这篇关于如何使用nokogiri和rubyzip编辑docx的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆