我无法从Nokogiri解析的字符串中删除空格 [英] I can't remove whitespaces from a string parsed by Nokogiri

查看:70
本文介绍了我无法从Nokogiri解析的字符串中删除空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法从字符串中删除空格.

I can't remove whitespaces from a string.

我的HTML是:

<p class='your-price'>
Cena pro Vás: <strong>139&nbsp;<small>Kč</small></strong>
</p>

我的代码是:

#encoding: utf-8
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
site  = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")

val = price.first.text  => "139 "
val.strip               => "139 "
val.gsub(" ", "")       => "139 "

gsubstrip等无效.为什么,以及如何解决这个问题?

gsub, strip, etc. don't work. Why, and how do I fix this?

val.class      => String
val.dump       => "\"139\\u{a0}\""      !
val.encoding   => #<Encoding:UTF-8>

__ENCODING__               => #<Encoding:UTF-8>
Encoding.default_external  => #<Encoding:UTF-8>

我正在使用Ruby 1.9.3,所以Unicode应该没问题.

I'm using Ruby 1.9.3 so Unicode shouldn't be problem.

推荐答案

strip仅删除ASCII空格,并且您在此处获得的字符是Unicode不间断空格.

strip only removes ASCII whitespace and the character you've got here is a Unicode non-breaking space.

删除字符很容易.您可以通过提供带有字符代码的正则表达式来使用gsub:

Removing the character is easy. You can use gsub by providing a regex with the character code:

gsub(/\u00a0/, '')

您也可以致电

gsub(/[[:space:]]/, '')

删除所有Unicode空格.有关详细信息,请参见 Regexp文档.

to remove all Unicode whitespace. For details, check the Regexp documentation.

这篇关于我无法从Nokogiri解析的字符串中删除空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆