相当于Ruby 1.9.X中的Iconv.conv(“UTF-8 // IGNORE”,...)? [英] Equivalent of Iconv.conv("UTF-8//IGNORE",...) in Ruby 1.9.X?

查看:148
本文介绍了相当于Ruby 1.9.X中的Iconv.conv(“UTF-8 // IGNORE”,...)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从远程源读取数据,偶尔得到另一个编码中的某些字符。它们不重要。



我想得到一个最好的猜测utf-8字符串,并忽略无效数据。



主要目标是获取可以使用的字符串,而不会遇到错误,例如:




  • 编码:: UndefinedConversionError:从ASCII-8BIT到UTF-8的\xFF:

  • utf-8中的无效字节序列


解决方案

我以为是这样的:



string.encode(UTF-8,:invalid =>:replace,:undef =>:replace,:replace =>?)



将用'?'替换所有已知的。



要忽略所有未知数,:replace => ''



string.encode(UTF-8,:invalid =>:replace, undef =>:replace,:replace =>)



编辑:

我不知道这是可靠的。我已经进入偏执模式,并且一直在使用:



string.encode(UTF-8,...)。 force_encoding('UTF-8')



脚本似乎正在运行,好的。但我很确定我早些时候会收到错误。



编辑2:



即使这样,我继续收到间歇性错误。不是每次都记住你有时候


I'm reading data from a remote source, and occassionally get some characters in another encoding. They're not important.

I'd like to get get a "best guess" utf-8 string, and ignore the invalid data.

Main goal is to get a string I can use, and not run into errors such as:

  • Encoding::UndefinedConversionError: "\xFF" from ASCII-8BIT to UTF-8:
  • invalid byte sequence in utf-8

解决方案

I thought this was it:

string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "?")

will replace all knowns with '?'.

To ignore all unknowns, :replace => '':

string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "")

Edit:

I'm not sure this is reliable. I've gone into paranoid-mode, and have been using:

string.encode("UTF-8", ...).force_encoding('UTF-8')

Script seems to be running, ok now. But I'm pretty sure I'd gotten errors with this earlier.

Edit 2:

Even with this, I continue to get intermittant errors. Not every time, mind you. Just sometimes.

这篇关于相当于Ruby 1.9.X中的Iconv.conv(“UTF-8 // IGNORE”,...)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆