从字符串中删除重音符号/变音符号,同时保留其他特殊字符(尝试mb_chars.normalize和iconv) [英] Removing accents/diacritics from string while preserving other special chars (tried mb_chars.normalize and iconv)
问题描述
已经有一个非常类似的问题 。一种解决方案使用的是这样的代码:
There is a very similar question already. One of the solutions uses code like this one:
string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s
这是一个奇观,直到您注意到它
Which works wonders, until you notice it also removes spaces, dots, dashes, and who knows what else.
我不太确定第一个代码的工作原理,但是可以去除只有重音?或至少会得到要保存的字符列表?我对正则表达式的了解很少,但是我尝试了一下(无济于事):
I'm not really sure how the first code works, but could it be made to strip only accents? Or at the very least be given a list of chars to preserve? My knowledge of regexps is small, but I tried (to no avail):
/[^\-x00-\x7F]/n # So it would leave the dash alone
我将要做这样的事情:
string.mb_chars.normalize(:kd).gsub('-', '__DASH__').gsub
(/[^x00-\x7F]/n, '').gsub('__DASH__', '-').to_s
凶恶?是的...
我也尝试过:
iconv = Iconv.new('UTF-8', 'US-ASCII//TRANSLIT') # Also tried ISO-8859-1
iconv.iconv 'Café' # Throws an error: Iconv::IllegalSequence: "é"
请帮助?
推荐答案
它还会删除空格,点,破折号以及其他谁知道的东西。
it also removes spaces, dots, dashes, and who knows what else.
不应该。
string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s
您输错了字,应该加反斜线在x00之前,指的是NUL字符。
You've mistyped, there should be a backslash before the x00, to refer to the NUL character.
/[^\-x00-\x7F]/n # So it would leave the dash alone
您已将'-'置于'\'之间和 x,它将破坏对空字符的引用,从而破坏范围。
You've put the ‘-’ between the ‘\’ and the ‘x’, which will break the reference to the null character, and thus break the range.
这篇关于从字符串中删除重音符号/变音符号,同时保留其他特殊字符(尝试mb_chars.normalize和iconv)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!