Rails v3 / Ruby 1.9.2中的字符编码问题 [英] Character Encoding issue in Rails v3/Ruby 1.9.2

查看:150
本文介绍了Rails v3 / Ruby 1.9.2中的字符编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从文件中读取内容时,有时会出现 UTF-8中的无效字节序列错误。注意 - 这只发生在字符串中有一些特殊字符时。我尝试打开没有r:UTF-8的文件,但仍然得到相同的错误。

I get this error sometimes "invalid byte sequence in UTF-8" when I read contents from a file. Note - this only happens when there are some special characters in the string. I have tried opening the file without "r:UTF-8", but still get the same error.

open(file, "r:UTF-8").each_line { |line| puts line.strip(",") } # line.strip generates the error

的文件:

Contents of the file:

# encoding: UTF-8
290919,"SE","26","Sk‰l","",59.4500,17.9500,, # this errors out
290956,"CZ","45","HornÌ Bradlo","",49.8000,15.7500,, # this errors out
290958,"NO","02","Svaland","",58.4000,8.0500,, # this works

这是我从外部获取的CSV文件,我试图将其导入我的数据库,它没有#encoding:UTF-8在顶部,但我添加了因为我读的地方,它会解决这个问题,但它没有。 :(

This is the CSV file I got from outside and I am trying to import it into my DB, it did not come with "# encoding: UTF-8" at the top, but I added this since I read somewhere it will fix this problem, but it did not. :(

环境:


  • Rails v3 .0.3

  • ruby​​ 1.9.2p0 (2010-08-18版本29036)[x86_64-darwin10.5.0]


    • Rails v3.0.3
    • ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.5.0]

    推荐答案

    Ruby对每个文件都有一个外部编码和内部编码的概念,这允许你使用UTF-8即使文件以更深奥的格式存储,如果您的默认外部编码是UTF-8(这是如果你是在Mac OS X),所有的文件I / O将是你可以使用 File.open('file')。external_encoding 来检查你打开你的文件并传递r:UTF-8强制使用与Ruby默认使用的相同的外部编码。

    Ruby has a notion of an external encoding and internal encoding for each file. This allows you to work with a file in UTF-8 in your source, even when the file is stored in a more esoteric format. If your default external encoding is UTF-8 (which it is if you're on Mac OS X), all of your file I/O is going to be in UTF-8 as well. You can check this using File.open('file').external_encoding. What you're doing when you opening your file and passing "r:UTF-8" is forcing the same external encoding that Ruby is using by default.

    源文档不是在UTF-8和那些非ASCII字符不是干净地映射到UTF-8(如果他们是,你会得到正确的字符,没有错误,如果他们映射不正确,你会得到不正确的字符,没有错误)。你应该做的是尝试确定源文档的编码,然后让Ruby对文档进行转码,如下所示:

    Chances are, your source document isn't in UTF-8 and those non-ascii characters aren't mapping cleanly to UTF-8 (if they were, you would either get the correct characters and no error, and if they mapped by incorrectly, you would get incorrect characters and no error). What you should do is try to determine the encoding of the source document, then have Ruby transcode the document on read, like so:

    File.open(file, "r:windows-1251:utf-8").each_line { |line| puts line.strip(",") }
    

    如果您需要帮助确定源的编码,给这个Python库一个旋风。它基于Seamonkey / Mozilla中的自动字符集检测后备(可能仍在Firefox中)。

    If you need help determining the encoding of the source, give this Python library a whirl. It's based on the automatic charset detection fallback that was in Seamonkey/Mozilla (and is possibly still in Firefox).

    这篇关于Rails v3 / Ruby 1.9.2中的字符编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆