ruby 不正确的方法行为(可能取决于字符集) [英] ruby incorrect method behavior (possible depending charset)
问题描述
我从 ruby 那里得到了奇怪的行为(在 irb 中):
I got weird behavior from ruby (in irb):
irb(main):002:0> pp " LS 600"
"\302\240\302\240\302\240\302\240LS 600"
irb(main):003:0> pp " LS 600".strip
"\302\240\302\240\302\240\302\240LS 600"
这意味着(对于那些不理解的人)strip
方法根本不影响这个字符串,与 gsub('/\s+/', '')
That means (for those, who don't understand) that strip
method does not affect this string at all, same with gsub('/\s+/', '')
我怎样才能去掉那个字符串(我在解析 Internet 页面时得到它)?
How can I strip that string (I got it while parsing Internet page)?
推荐答案
字符串 "\302\240"
是一个 UTF-8 编码的字符串 (C2 A0
)对于 Unicode 代码点 A0
,它表示一个不间断的空格字符.还有许多其他 Unicode 空格字符.不幸的是,String#strip
方法没有删除这些.
The string "\302\240"
is a UTF-8 encoded string (C2 A0
) for Unicode code point A0
, which represents a non breaking space character. There are many other Unicode space characters. Unfortunately the String#strip
method removes none of these.
如果您使用 Ruby 1.9.2,那么您可以通过以下方式解决此问题:
If you use Ruby 1.9.2, then you can solve this in the following way:
# Ruby 1.9.2 only.
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".gsub(/^\p{Space}+|\p{Space}+$/, "")
在 Ruby 1.8.7 中对 Unicode 的支持不太好.如果您可以依赖 Rails 的 ActiveSupport::Multibyte
,您可能会成功.这样做的好处是可以免费获得有效的 strip
方法.使用 gem install activesupport
安装 ActiveSupport,然后试试这个:
In Ruby 1.8.7 support for Unicode is not as good. You might be successful if you can depend on Rails's ActiveSupport::Multibyte
. This has the advantage of getting a working strip
method for free. Install ActiveSupport with gem install activesupport
and then try this:
# Ruby 1.8.7/1.9.2.
$KCODE = "u"
require "rubygems"
require "active_support/core_ext/string/multibyte"
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".mb_chars.strip.to_s
这篇关于ruby 不正确的方法行为(可能取决于字符集)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!