如何从 Ruby 中的字符串中删除所有非 ASCII 字符 [英] How to remove all non - ASCII characters from a string in Ruby
问题描述
我似乎是一个很简单也很需要的方法.我需要从字符串中删除所有非 ASCII 字符.例如©等.请参见以下示例.
I seems to be a very simple and much needed method. I need to remove all non ASCII characters from a string. e.g © etc. See the following example.
#coding: utf-8
s = " Hello this a mixed string © that I made."
puts s.encoding
puts s.encode
输出:
UTF-8
Hello this a mixed str
ing ┬⌐ 是我做的.
ing © that I made.
当我将其提供给 Watir 时,它会产生以下错误:不兼容的字符编码:UTF-8 和 ASCII-8BIT
When I feed this to Watir, it produces following error:incompatible character encodings: UTF-8 and ASCII-8BIT
所以我的问题是我想在使用之前去掉所有非 ASCII 字符.我不知道源字符串s"使用哪种编码.
So my problem is that I want to get rid of all non ASCII characters before using it. I will not know which encoding the source string "s" uses.
我已经搜索和试验了一段时间.
I have been searching and experimenting for quite some time now.
如果我尝试使用
puts s.encode('ASCII-8BIT')
它给出了错误:
: "\xC2\xA9" from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)
推荐答案
您可以直接将您所询问的内容翻译成 Regexp
.你写道:
You can just literally translate what you asked into a Regexp
. You wrote:
我想去掉所有非 ASCII 字符
I want to get rid of all non ASCII characters
我们可以稍微改写一下:
We can rephrase that a little bit:
我想替换所有没有 ASCII
属性的字符
I want to substitue all characters which don't thave the
ASCII
property with nothing
这是一个可以直接在Regexp
中表达的语句:
And that's a statement that can be directly expressed in a Regexp
:
s.gsub!(/\P{ASCII}/, '')
作为替代,您也可以使用 String#delete!
:
As an alternative, you could also use String#delete!
:
s.delete!("^\u{0000}-\u{007F}")
这篇关于如何从 Ruby 中的字符串中删除所有非 ASCII 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!