我可以在 Ruby 1.9 上设置默认字符串编码吗? [英] Can I set the default string encoding on Ruby 1.9?

查看:23
本文介绍了我可以在 Ruby 1.9 上设置默认字符串编码吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这听起来可能很小,但它让我发疯了.自从上周五在 Ruby 1.9 上发布应用程序到生产环境后,我遇到了许多与字符编码相关的小异常.几乎所有内容都与以下内容有关:

This might sound minor, but it's been driving me nuts. Since releasing an application to production last Friday on Ruby 1.9, I've been having lots of minor exceptions related to character encodings. Almost all of it is some variation on:

Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

我们有一个国际用户群,所以很多名字都包含变音等.如果我修复模板以在很多地方使用 force_encoding,它会在 flash 消息助手中弹出.等等.

We have an international user base so plenty of names contain umlauts, etc. If I fix the templates to use force_encoding in a bunch of places, it pops up in the flash message helper. Et cetera.

目前看起来我已经确定了所有我知道的,通过在一个地方修补 ActiveSupport 的字符串连接,然后在顶部设置 # encoding: utf-8我的每一个源文件.但是,我可能不得不记住对我从现在开始做的每个 Ruby 项目的每个文件都这样做,永远这样,只是为了避免字符串分配问题,这种感觉并没有让我满意.我了解了 -Ku 开关,但似乎一切都在警告它是为了向后兼容,并且可能随时消失.

At the moment it looks like I've nailed down all the ones I knew about, by patching ActiveSupport's string concatenation in one place and then by setting # encoding: utf-8 at the top of every one of my source files. But the feeling that I might have to remember to do that for every file of every Ruby project I ever do from now on, forever, just to avoid string assignment problems, does not sit well in my stomach. I read about the -Ku switch but everything seems to warn that it's for backwards compatibility and might go away at any time.

所以我对有 1.9 经验的人的问题是:在我的每个文件中设置 #encoding 是否真的有必要?有没有一种合理的方法可以在全球范围内做到这一点?或者,更好的是,有一种方法可以为绕过内部/外部默认值的字符串的非文字值设置默认编码?

So my question for 1.9-experienced folks: is setting #encoding in every one of my files really necessary? Is there a reasonable way to do this globally? Or, better, a way to set the default encoding on non-literal values of strings that bypass the internal/external defaults?

预先感谢您的任何建议.

Thanks in advance for any suggestions.

推荐答案

不要混淆文件编码和字符串编码

Don't confuse file encoding with string encoding

文件顶部的 #encoding 语句的目的是让 Ruby 在阅读/解释您的代码期间知道,并且您的编辑器知道如何处理任何非 ASCII 字符,同时编辑/读取文件 -- 只有在文件中至少有一个非 ASCII 字符时才需要.例如在您的配置/区域设置文件中是必需的.

The purpose of the #encoding statement at the top of files is to let Ruby know during reading / interpreting your code, and your editor know how to handle any non-ASCII characters while editing / reading the file -- it is only necessary if you have at least one non-ASCII character in the file. e.g. it's necessary in your config/locale files.

要一次性定义所有文件的编码,您可以使用magic_encoding gem,它可以将 uft-8 魔术注释插入到所有 ruby​​ 文件中您的应用.

To define the encoding in all your files at once, you can use the magic_encoding gem, it can insert uft-8 magic comment to all ruby files in your app.

您在运行时遇到的错误 Encoding::CompatibilityError 是当您在程序执行期间尝试连接两个具有不同编码的字符串并且它们的编码不兼容时发生的错误.

The error you're getting at runtime Encoding::CompatibilityError is an error which happens when you try to concatenate two Strings with different encoding during program execution, and their encodings are incompatible.

这最有可能发生在:

  • 您正在使用 L10N 字符串(例如 UTF-8),并将它们连接到例如ASCII 字符串(在您看来)

  • you are using L10N strings (e.g. UTF-8), and concatenate them to e.g. ASCII string (in your view)

用户输入外语字符串(例如 UTF-8),您的视图尝试在某个视图中将其打印出来,以及您预定义的一些固定字符串(ASCII).force_encoding 将有助于.Rails 1.9 中还有 Encoding::primary_encoding 用于设置新字符串的默认编码.Rails 中的 config/application.rb 文件中有 config.encoding.

the user types in a string in a foreign language (e.g. UTF-8), and your view tries to print it out in some view, along with some fixed string which you pre-defined (ASCII). force_encoding will help there. There's also Encoding::primary_encoding in Rails 1.9 to set the default encoding for new Strings. And there is config.encoding in Rails in the config/application.rb file.

来自您的数据库的字符串,然后在您的视图中与其他字符串组合.(他们的编码可能是两种方式,并且不兼容).

String which come from your database, and then are combined with other Strings in your view. (their encodings could be either way around, and incompatible).

旁注:确保在创建数据库时指定默认编码!

    create database yourproject  DEFAULT CHARACTER SET utf8;

如果您想在字符串中使用表情符号:

    create database yourproject DEFAULT CHARACTER SET utf8mb4 collate utf8mb4_bin;

并且可能包含 EMOJI 的字符串列上的所有索引的长度都需要为 191 个字符.字符集 utf8mb4 整理 utf8mb4_bin

and all indexes on string columns which may contain EMOJI need to be 191 characters in length. CHARACTER SET utf8mb4 COLLATE utf8mb4_bin

这样做的原因是普通 UTF8 最多使用 3 个字节,而 EMOJI 使用 4 个字节存储.

The reason for this is that normal UTF8 uses up to 3 bytes, whereas EMOJI use 4 bytes storage.

请查看这篇 Yehuda Katz 文章,其中深入介绍了这一点,并对其进行了很好的解释:(特别是不兼容的编码"一节)

Please check this Yehuda Katz article, which covers this in-depth, and explains it very well: (there is specifically a section 'Incompatible Encodings')

http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/

http://yehudakatz.com/2010/05/17/encodings-未删节/

和:

http://zargony.com/2009/07/24/ruby-1-9-and-file-encodings

http://graysoftinc.com/character-encodings

这篇关于我可以在 Ruby 1.9 上设置默认字符串编码吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆