从字符串中删除重音符号/变音符号,同时保留其他特殊字符(尝试mb_chars.normalize和iconv) [英] Removing accents/diacritics from string while preserving other special chars (tried mb_chars.normalize and iconv)

查看:144
本文介绍了从字符串中删除重音符号/变音符号,同时保留其他特殊字符(尝试mb_chars.normalize和iconv)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已经有一个非常类似的问题 。一种解决方案使用的是这样的代码:

There is a very similar question already. One of the solutions uses code like this one:

string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s

这是一个奇观,直到您注意到它

Which works wonders, until you notice it also removes spaces, dots, dashes, and who knows what else.

我不太确定第一个代码的工作原理,但是可以去除只有重音?或至少会得到要保存的字符列表?我对正则表达式的了解很少,但是我尝试了一下(无济于事):

I'm not really sure how the first code works, but could it be made to strip only accents? Or at the very least be given a list of chars to preserve? My knowledge of regexps is small, but I tried (to no avail):

/[^\-x00-\x7F]/n # So it would leave the dash alone

我将要做这样的事情:

string.mb_chars.normalize(:kd).gsub('-', '__DASH__').gsub
  (/[^x00-\x7F]/n, '').gsub('__DASH__', '-').to_s

凶恶?是的...

我也尝试过:

iconv = Iconv.new('UTF-8', 'US-ASCII//TRANSLIT') # Also tried ISO-8859-1
iconv.iconv 'Café' # Throws an error: Iconv::IllegalSequence: "é"

请帮助?

推荐答案


它还会删除空格,点,破折号以及其他谁知道的东西。

it also removes spaces, dots, dashes, and who knows what else.

不应该。

string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s

您输错了字,应该加反斜线在x00之前,指的是NUL字符。

You've mistyped, there should be a backslash before the x00, to refer to the NUL character.

/[^\-x00-\x7F]/n # So it would leave the dash alone

您已将'-'置于'\'之间和 x,它将破坏对空字符的引用,从而破坏范围。

You've put the ‘-’ between the ‘\’ and the ‘x’, which will break the reference to the null character, and thus break the range.

这篇关于从字符串中删除重音符号/变音符号,同时保留其他特殊字符(尝试mb_chars.normalize和iconv)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆