如何用nokogiri解析XML而不丢失HTML实体？ [英] How to parse XML with nokogiri without losing HTML entities?

查看：115 发布时间：2018/6/23 15:49:17 html ruby nokogiri

本文介绍了如何用nokogiri解析XML而不丢失HTML实体？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果你看下面的输出部分，ruby会删除所有的html实体。如何在不丢失HTML实体的情况下解析nokogiri XML？

If you look at the output below in the after section ruby is removing all the html entities. How to parse XML with nokogiri without loosing HTML entities?

--- BEFORE ---

<blog:entryFull>
&lt;p&gt;&lt;iframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946&amp;amp;show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"&gt;&lt;/iframe&gt;&lt;/p&gt;</blog:entryFull>

--- AFTER --- 

<blog:entryFull>
piframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946amp;show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"/iframe/p</blog:entryFull>
  </blog:example>

以下是代码：

Here is the code:

f = File.open(item) contents = "" f.each {|line| contents << line } puts "--- BEFORE ---" puts contents puts "--- AFTER ---" doc = Nokogiri::XML::DocumentFragment.parse(contents) puts doc f.close

推荐答案

您的测试文件可能包含一些无效的HTML实体。 >

Your test file might have some invalid HTML entities.

require 'nokogiri' puts "--- INVALID ---" invalid_xml = <<-XML <blog:entryFull>invalid M&Ms</blog:entryFull> <blog:entryFull> <p><iframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946&amp;show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe></p></blog:entryFull> XML doc = Nokogiri::XML::DocumentFragment.parse(invalid_xml) puts doc puts "--- VALID ---" valid_xml = <<-XML <blog:entryFull>valid M&Ms</blog:entryFull> <blog:entryFull> <p><iframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946&amp;show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe></p></blog:entryFull> XML doc = Nokogiri::XML::DocumentFragment.parse(valid_xml) puts doc

结果：

result:

$ ruby nokogiri.rb --- INVALID --- <blog:entryFull>invalid M</blog:entryFull> <blog:entryFull> piframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946amp;show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"/iframe/p</blog:entryFull> --- VALID --- <blog:entryFull>valid M&Ms</blog:entryFull> <blog:entryFull> <p><iframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946&amp;show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe></p></blog:entryFull>

so，

so,

修正输入XML

使用STRICT ParseOptions

严格解析示例：

strict parsing example:

invalid_xml = <<-XML <?xml version="1.0" encoding="UTF-8"?> <root> <blog:entryFull>invalid M&Ms</blog:entryFull> <blog:entryFull> <p><iframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946&amp;show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe></p></blog:entryFull> </root> XML begin doc = Nokogiri::XML(invalid_xml) do |configure| configure.strict # strict parsing end puts doc rescue => e puts 'INVALID XML' end

这篇关于如何用nokogiri解析XML而不丢失HTML实体？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何用nokogiri解析XML而不丢失HTML实体？ [英] How to parse XML with nokogiri without losing HTML entities?

问题描述

推荐答案

结果：

result:

严格解析示例：

strict parsing example:

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何用nokogiri解析XML而不丢失HTML实体？ [英] How to parse XML with nokogiri without losing HTML entities?

问题描述

推荐答案

结果：

result:

严格解析示例：

strict parsing example:

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭