如何在 Ruby 中将字符串从 windows-1252 转换为 utf-8? [英] How can I convert a string from windows-1252 to utf-8 in Ruby?

查看:24
本文介绍了如何在 Ruby 中将字符串从 windows-1252 转换为 utf-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Windows XP 上使用 Ruby 1.8.6 将一些数据从 MS Access 2003 迁移到 MySQL 5.0(编写 Rake 任务来执行此操作).

I'm migrating some data from MS Access 2003 to MySQL 5.0 using Ruby 1.8.6 on Windows XP (writing a Rake task to do this).

事实证明,Windows 字符串数据被编码为 windows-1252,Rails 和 MySQL 都假设输入是 utf-8,因此某些字符(例如撇号)被破坏了.它们以a"结尾,上面带有重音之类的东西.

Turns out the Windows string data is encoded as windows-1252 and Rails and MySQL are both assuming utf-8 input so some of the characters, such as apostrophes, are getting mangled. They wind up as "a"s with an accent over them and stuff like that.

有谁知道将 windows-1252 字符串转换为 utf-8 的工具、库、系统、方法、仪式、咒语或咒语?

Does anyone know of a tool, library, system, methodology, ritual, spell, or incantation to convert a windows-1252 string to utf-8?

推荐答案

对于 Ruby 1.8.6,您似乎可以使用标准库的一部分 Ruby Iconv:

For Ruby 1.8.6, it appears you can use Ruby Iconv, part of the standard library:

Iconv 文档

根据这篇有用的文章,看来您至少可以像这样从字符串中清除不需要的 win-1252 字符:

According this helpful article, it appears you can at least purge unwanted win-1252 characters from your string like so:

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

然后可能会尝试像这样进行完全转换:

One might then attempt to do a full conversion like so:

ic = Iconv.new('UTF-8', 'WINDOWS-1252')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

这篇关于如何在 Ruby 中将字符串从 windows-1252 转换为 utf-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆