如何在Ruby中将字符串转换为windows-1252到utf-8? [英] How can I convert a string from windows-1252 to utf-8 in Ruby?

查看:165
本文介绍了如何在Ruby中将字符串转换为windows-1252到utf-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我正在Windows XP中使用Ruby 1.8.6将一些数据从MS Access 2003迁移到MySQL 5.0(撰写耙子任务来执行此操作)。 Windows字符串数据被编码为Windows-1252,而Rails和MySQL都假设utf-8输入,所以某些字符(如撇号)正在被破坏。他们会以一个的形式出现在他们身上,并且像这样的东西。



有谁知道工具,图书馆,系统,方法论,仪式,拼写,或咒语将windows-1252字符串转换为utf-8?

解决方案

对于Ruby 1.8.6,您可以使用Ruby Iconv是标准库的一部分:



Iconv文档



根据这个有用的文章,似乎您可以至少从字符串中清除不需要的win-1252字符,如下所示:

  ic = Iconv.new('UTF-8 // IGNORE','UTF-8')
valid_string = ic.iconv untrusted_string +'')[0 ..- 2]

然后可能会尝试进行完全转换像这样:

  ic = Iconv.new('UTF-8','W INDOWS-1252')
valid_string = ic.iconv(untrusted_string +'')[0 ..- 2]


I'm migrating some data from MS Access 2003 to MySQL 5.0 using Ruby 1.8.6 on Windows XP (writing a Rake task to do this).

Turns out the Windows string data is encoded as windows-1252 and Rails and MySQL are both assuming utf-8 input so some of the characters, such as apostrophes, are getting mangled. They wind up as "a"s with an accent over them and stuff like that.

Does anyone know of a tool, library, system, methodology, ritual, spell, or incantation to convert a windows-1252 string to utf-8?

解决方案

For Ruby 1.8.6, it appears you can use Ruby Iconv, part of the standard library:

Iconv documentation

According this helpful article, it appears you can at least purge unwanted win-1252 characters from your string like so:

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

One might then attempt to do a full conversion like so:

ic = Iconv.new('UTF-8', 'WINDOWS-1252')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

这篇关于如何在Ruby中将字符串转换为windows-1252到utf-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆