我无法从Nokogiri解析的字符串中删除空格 [英] I can't remove whitespaces from a string parsed by Nokogiri
本文介绍了我无法从Nokogiri解析的字符串中删除空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我无法从字符串中删除空格.
I can't remove whitespaces from a string.
我的HTML是:
<p class='your-price'>
Cena pro Vás: <strong>139 <small>Kč</small></strong>
</p>
我的代码是:
#encoding: utf-8
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
site = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")
val = price.first.text => "139 "
val.strip => "139 "
val.gsub(" ", "") => "139 "
gsub
,strip
等无效.为什么,以及如何解决这个问题?
gsub
, strip
, etc. don't work. Why, and how do I fix this?
val.class => String
val.dump => "\"139\\u{a0}\"" !
val.encoding => #<Encoding:UTF-8>
__ENCODING__ => #<Encoding:UTF-8>
Encoding.default_external => #<Encoding:UTF-8>
我正在使用Ruby 1.9.3,所以Unicode应该没问题.
I'm using Ruby 1.9.3 so Unicode shouldn't be problem.
推荐答案
strip
仅删除ASCII空格,并且您在此处获得的字符是Unicode不间断空格.
strip
only removes ASCII whitespace and the character you've got here is a Unicode non-breaking space.
删除字符很容易.您可以通过提供带有字符代码的正则表达式来使用gsub
:
Removing the character is easy. You can use gsub
by providing a regex with the character code:
gsub(/\u00a0/, '')
您也可以致电
gsub(/[[:space:]]/, '')
删除所有Unicode空格.有关详细信息,请参见 Regexp文档.
to remove all Unicode whitespace. For details, check the Regexp documentation.
这篇关于我无法从Nokogiri解析的字符串中删除空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文