如何使Nokogiri透明地返回未编码的Html实体? [英] How to make Nokogiri transparently return un/encoded Html entities untouched?

查看:90
本文介绍了如何使Nokogiri透明地返回未编码的Html实体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



ie:

<$>

如何使用Nokogiri使html实体(如德语变音符号) p $ p> #这是好的
node = Nokogiri :: HTML.fragment('< p& ouml;< / p>')
node.to_s #=> '< p>& ouml;< / p>'

#这不是
node = Nokogiri :: HTML.fragment('< p>< / p> ')
node.to_s#=> '< p>& ouml;< / p>'

#这就是我需要的
node = Nokogiri :: HTML.fragment('< p>< / p>')
node.to_s#=> '< p>ö< / p>'

我已经尝试了PARSE_OPTIONS和:save_with选项,但不能想出一种方法让Nokogiri的透明行为如上。



任何指针?

解决方案

好的,我的问题已经由Aaron通过 twitter / gist

  require'rubygems'
require'nokogiri'

doc = Nokogiri :: HTML :: Document.new
doc.encoding ='UTF-8 '

#我们为1.4.2版本添加了一个上下文片段方法。这*可能*
#在1.4.1中工作。如果你想捣乱1.4.2,从我的github构建,或
#抓住我们每晚的构建之一:

#$ sudo gem install nokogiri -s http:// tenderlovemaking .com /

#另外,libxml2在处理UTF-8片段时有一个编码错误,所以我
#建议你也升级到libxml2 2.7.7。

#希望有所帮助!
puts doc.fragment('< p>ö< / p>')


How can I use Nokogiri with having html entities (like German umlauts) untouched?

I.e.:

# this is fine
node = Nokogiri::HTML.fragment('<p>&ouml;</p>')
node.to_s # => '<p>&ouml;</p>'

# this is not
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>&ouml;</p>'

# this is what I need
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'

I've tried to mess with both PARSE_OPTIONS and :save_with options but could not come up with a way to have Nokogiri just transparently behave like above.

Any pointers?

解决方案

Ok, my question has been answered by Aaron via twitter/gist:

require 'rubygems'
require 'nokogiri'

doc = Nokogiri::HTML::Document.new
doc.encoding = 'UTF-8'

# We added a contextual fragment method for the 1.4.2 release. This *might*
# work in 1.4.1. If you want to mess with 1.4.2, build from my github, or
# grab one of our nightly builds:
#
# $ sudo gem install nokogiri -s http://tenderlovemaking.com/
#
# Also, libxml2 had a bug with encoding when handling UTF-8 fragments, so I
# suggest you also upgrade to libxml2 2.7.7.
#
# Hope that helps!
puts doc.fragment('<p>ö</p>')

这篇关于如何使Nokogiri透明地返回未编码的Html实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆