有没有办法在ruby 1.9从字符串中删除无效的字节序列? [英] Is there a way in ruby 1.9 to remove invalid byte sequences from strings?
本文介绍了有没有办法在ruby 1.9从字符串中删除无效的字节序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设你有一个字符串€foo\xA0
,编码的UTF-8,有没有办法从此字符串中删除无效的字节序列? (所以你得到€foo
)
在ruby-1.8你可以使用 Iconv.iconv('UTF-8 // IGNORE','UTF-8',€foo\xA0)
但现在已弃用。 €foo\xA0.encode('UTF-8')
不做任何事情,因为它已经是UTF-8。我试过:
foo \xA0.force_encoding('BINARY')。encode('UTF-8' :undef =>:replace,:replace =>'')
>
foo
但也会丢失有效的多字节字符
解决方案
€foo\xA0.chars.select(& :valid_encoding?)。join
Suppose you have a string like "€foo\xA0"
, encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get "€foo"
)
In ruby-1.8 you could use Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€foo\xA0")
but that is now deprecated. "€foo\xA0".encode('UTF-8')
doesn't do anything, since it is already UTF-8. I tried:
"€foo\xA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '')
which yields
"foo"
But that also loses the valid multibyte character €
解决方案
"€foo\xA0".chars.select(&:valid_encoding?).join
这篇关于有没有办法在ruby 1.9从字符串中删除无效的字节序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文