ruby 1.9错误的文件编码在Windows上 [英] ruby 1.9 wrong file encoding on windows

查看:97
本文介绍了ruby 1.9错误的文件编码在Windows上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含这些内容的红宝石文件:

 #encoding:iso-8859-1 
文件。 open('foo.txt',w:iso-8859-1){| f | f < 'fòo'}
放置File.read('foo.txt')。encoding




  • 当我从windows命令提示符运行它ruby 1.9.3我得到:IBM437

  • 当我运行它从cygwin ruby​​ 1.9.3我得到:UTF-8

  • 我期望得到的是:iso-8859-1



有人可以解释什么发生在这里?



更新



这是一个更好的描述我寻找:




  • 现在我明白感谢Darshan,默认情况下ruby会加载
    Encoding.default _external中的文件,但应该'#编码:iso-8859-1
    线重写?

  • 应该是ruby能够自动检测文件的编码吗?有没有
    文件系统,其中编码是一个属性?

  • 我最好的选择是记住我保存文件
    的编码? >

解决方案

您没有在读取文件时指定编码。你非常小心地指定它除了那里以外的任何地方,但是你正在使用默认的编码来阅读它。

 文件.open('foo.txt',w:iso-8859-1){| f | f < 'fòo'.force_encoding('iso-8859-1')} 
File.open('foo.txt',r:iso-8859-1){| f | put f.read()。encoding}

#=> ISO-8859-1

另请注意,您可能意味着'fòo'。编码('iso-8859-1')而不是'fòo'.force_encoding('iso-8859-1')。后者离开字节不变,而前者转码字符串。



更新:我会详细说明一点,因为我不是


  1. 如果您没有使用文件指定编码.read(),文件将以 Encoding.default_external 读取。由于您不是自己设计的,所以Ruby使用的值取决于运行的环境。在Windows环境中,它是IBM437;在您的Cygwin环境中,它是UTF-8。所以我上面的一点是,编码是这样的;它必须是,它与文件中包含的字节无关。 Ruby不会为您自动检测编码。


  2. force_encoding()不更改字符串中的字节,它只会更改附加到这些字节的编码。如果你告诉Ruby假装这个字符串是ISO-8859-1,那么当你说请把这个字符串写成ISO-8859-1时,它不会对它们进行转码。

    / li>

将这些组合在一起,如果ISO-8859-1中有源文件:


$ b $编码:iso-8859-1

#写入ISO-8859-1,不管default_external
File.open('foo .txt',w:iso-8859-1){| f | f < 'fòo'}

#读取ISO-8859-1,不管default_external,
#如果必要,转换为default_internal,如果设置
File.open('foo.txt' ,r:iso-8859-1){| f | puts f.read()。encoding}#=> ISO-8859-1

放置File.read('foo.txt')。编码# - >任何由default_external指定的

如果您有UTF-8中的源文件:

 #encoding:utf-8 

#写入ISO-8859-1,无论default_external,从UTF-8转码
File.open('foo.txt',w:iso-8859-1){| f | f < 'fòo'}

#读取ISO-8859-1,不管default_external,
#如果必要,转换为default_internal,如果设置
File.open('foo.txt' ,r:iso-8859-1){| f | puts f.read()。encoding}#=> ISO-8859-1

放置File.read('foo.txt')。编码# - >任何由default_external指定的

更新2,回答您的新问题


  1. 不,不改变 Encoding.default_external ,它只告诉Ruby,源文件本身是在ISO-8859-1中编码的。只需添加

      Encoding.default_external =iso-8859-1
    pre>

    如果您希望将所有读取的文件存储在该编码中。


  2. 否我不认为Ruby应该自动检测编码,但合理的人可以不同意这一点,并且讨论应该如此似乎是这个主题。


  3. 我个人使用UTF-8作为一切,在极少数情况下,我无法控制编码,当我阅读文件时,我手动设置编码,如上所述。我的源文件始终是UTF-8。如果您正在处理无法控制并且不知道编码的文件,那么 charguess gem



I have a ruby file with these contents:

# encoding: iso-8859-1
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
puts File.read('foo.txt').encoding

Can someone explain what's happening here?

UPDATE

Here's a better description of what I'm looking for:

  • I understand now thanks to Darshan that by default ruby will load files in Encoding.default _external, but shouldn't the # encoding: iso-8859-1 line override that?
  • Should ruby be able to auto-detect a file's encoding? Is there any filesystem where the encoding is an attribute?
  • What is my best option to 'remember' the encoding I saved the file in?

解决方案

You're not specifying the encoding when you read the file. You're being very careful to specify it everywhere except there, but then you're reading it with the default encoding.

File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'.force_encoding('iso-8859-1')}
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding }

# => ISO-8859-1

Also note that you probably mean 'fòo'.encode('iso-8859-1') rather than 'fòo'.force_encoding('iso-8859-1'). The latter leaves the bytes unchanged, while the former transcodes the string.

Update: I'll elaborate a bit since I wasn't as clear or thorough as I could have been.

  1. If you don't specify an encoding with File.read(), the file will be read with Encoding.default_external. Since you're not setting that yourself, Ruby is using a value depending on the environment it's run in. In your Windows environment, it's IBM437; in your Cygwin environment, it's UTF-8. So my point above was that of course that's what the encoding is; it has to be, and it has nothing to do with what bytes are contained in the file. Ruby doesn't auto-detect encodings for you.

  2. force_encoding() doesn't change the bytes in a string, it only changes the Encoding attached to those bytes. If you tell Ruby "pretend this string is ISO-8859-1", then it won't transcode them when you tell it "please write this string as ISO-8859-1". encode() transcodes for you, as does writing to the file if you don't trick it into not doing so.

Putting those together, if you have a source file in ISO-8859-1:

# encoding: iso-8859-1

# Write in ISO-8859-1 regardless of default_external
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}

# Read in ISO-8859-1 regardless of default_external,
#  transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1

puts File.read('foo.txt').encoding # -> Whatever is specified by default_external

If you have a source file in UTF-8:

# encoding: utf-8

# Write in ISO-8859-1 regardless of default_external, transcoding from UTF-8
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}

# Read in ISO-8859-1 regardless of default_external,
#  transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1

puts File.read('foo.txt').encoding # -> Whatever is specified by default_external

Update 2, to answer your new questions:

  1. No, the # encoding: iso-8859-1 line does not change Encoding.default_external, it only tells Ruby that the source file itself is encoded in ISO-8859-1. Simply add

    Encoding.default_external = "iso-8859-1"
    

    if you expect all files that your read to be stored in that encoding.

  2. No, I don't personally think Ruby should auto-detect encodings, but reasonable people can disagree on that one, and a discussion of "should it be so" seems off-topic here.

  3. Personally, I use UTF-8 for everything, and in the rare circumstances that I can't control encoding, I manually set the encoding when I read the file, as demonstrated above. My source files are always in UTF-8. If you're dealing with files that you can't control and don't know the encoding of, the charguess gem or similar would be useful.

这篇关于ruby 1.9错误的文件编码在Windows上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆