如何从 Ruby 中的字符串中删除所有非 ASCII 字符 [英] How to remove all non - ASCII characters from a string in Ruby

查看:51
本文介绍了如何从 Ruby 中的字符串中删除所有非 ASCII 字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎是一个很简单也很需要的方法.我需要从字符串中删除所有非 ASCII 字符.例如©等.请参见以下示例.

I seems to be a very simple and much needed method. I need to remove all non ASCII characters from a string. e.g © etc. See the following example.

#coding: utf-8
s = " Hello this a mixed string © that I made."
puts s.encoding
puts s.encode

输出:

UTF-8
Hello this a mixed str

ing ┬⌐ 是我做的.

ing © that I made.

当我将其提供给 Watir 时,它会产生以下错误:不兼容的字符编码:UTF-8 和 ASCII-8BIT

When I feed this to Watir, it produces following error:incompatible character encodings: UTF-8 and ASCII-8BIT

所以我的问题是我想在使用之前去掉所有非 ASCII 字符.我不知道源字符串s"使用哪种编码.

So my problem is that I want to get rid of all non ASCII characters before using it. I will not know which encoding the source string "s" uses.

我已经搜索和试验了一段时间.

I have been searching and experimenting for quite some time now.

如果我尝试使用

  puts s.encode('ASCII-8BIT')

它给出了错误:

 : "\xC2\xA9" from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)

推荐答案

您可以直接将您所询问的内容翻译成 Regexp.你写道:

You can just literally translate what you asked into a Regexp. You wrote:

我想去掉所有非 ASCII 字符

I want to get rid of all non ASCII characters

我们可以稍微改写一下:

We can rephrase that a little bit:

我想替换所有没有 ASCII 属性的字符

I want to substitue all characters which don't thave the ASCII property with nothing

这是一个可以直接Regexp中表达的语句:

And that's a statement that can be directly expressed in a Regexp:

s.gsub!(/\P{ASCII}/, '')

作为替代,您也可以使用 String#delete!:

As an alternative, you could also use String#delete!:

s.delete!("^\u{0000}-\u{007F}")

这篇关于如何从 Ruby 中的字符串中删除所有非 ASCII 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆