如何使Nokogiri透明地返回未编码的Html实体? [英] How to make Nokogiri transparently return un/encoded Html entities untouched?
问题描述
ie:
<$>
如何使用Nokogiri使html实体(如德语变音符号) p $ p> #这是好的
node = Nokogiri :: HTML.fragment('< p& ouml;< / p>')
node.to_s #=> '< p>& ouml;< / p>'
#这不是
node = Nokogiri :: HTML.fragment('< p>< / p> ')
node.to_s#=> '< p>& ouml;< / p>'
#这就是我需要的
node = Nokogiri :: HTML.fragment('< p>< / p>')
node.to_s#=> '< p>ö< / p>'
我已经尝试了PARSE_OPTIONS和:save_with选项,但不能想出一种方法让Nokogiri的透明行为如上。
任何指针?
好的,我的问题已经由Aaron通过 twitter / gist :
require'rubygems'
require'nokogiri'
doc = Nokogiri :: HTML :: Document.new
doc.encoding ='UTF-8 '
#我们为1.4.2版本添加了一个上下文片段方法。这*可能*
#在1.4.1中工作。如果你想捣乱1.4.2,从我的github构建,或
#抓住我们每晚的构建之一:
#
#$ sudo gem install nokogiri -s http:// tenderlovemaking .com /
#
#另外,libxml2在处理UTF-8片段时有一个编码错误,所以我
#建议你也升级到libxml2 2.7.7。
#
#希望有所帮助!
puts doc.fragment('< p>ö< / p>')
How can I use Nokogiri with having html entities (like German umlauts) untouched?
I.e.:
# this is fine
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'
# this is not
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'
# this is what I need
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'
I've tried to mess with both PARSE_OPTIONS and :save_with options but could not come up with a way to have Nokogiri just transparently behave like above.
Any pointers?
Ok, my question has been answered by Aaron via twitter/gist:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML::Document.new
doc.encoding = 'UTF-8'
# We added a contextual fragment method for the 1.4.2 release. This *might*
# work in 1.4.1. If you want to mess with 1.4.2, build from my github, or
# grab one of our nightly builds:
#
# $ sudo gem install nokogiri -s http://tenderlovemaking.com/
#
# Also, libxml2 had a bug with encoding when handling UTF-8 fragments, so I
# suggest you also upgrade to libxml2 2.7.7.
#
# Hope that helps!
puts doc.fragment('<p>ö</p>')
这篇关于如何使Nokogiri透明地返回未编码的Html实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!