读取文件时如何避免被 UTF-8 BOM 绊倒 [英] How to avoid tripping over UTF-8 BOM when reading files

查看:28
本文介绍了读取文件时如何避免被 UTF-8 BOM 绊倒的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个最近添加了 Unicode BOM 标头 (U+FEFF) 的数据提要,而我的 rake 任务现在被它搞砸了.

I'm consuming a data feed that has recently added a Unicode BOM header (U+FEFF), and my rake task is now messed up by it.

我可以使用 file.gets[3..-1] 跳过前 3 个字节,但是有没有更优雅的方式来读取 Ruby 中的文件,它可以正确处理这个问题,无论 BOM 是否为存在与否?

I can skip the first 3 bytes with file.gets[3..-1] but is there a more elegant way to read files in Ruby which can handle this correctly, whether a BOM is present or not?

推荐答案

使用 ruby​​ 1.9.2 你可以使用模式 r:bom|utf-8

With ruby 1.9.2 you can use the mode r:bom|utf-8

text_without_bom = nil #define the variable outside the block to keep the data
File.open('file.txt', "r:bom|utf-8"){|file|
  text_without_bom = file.read
}

text_without_bom = File.read('file.txt', encoding: 'bom|utf-8')

text_without_bom = File.read('file.txt', mode: 'r:bom|utf-8')

BOM 在文件中是否可用并不重要.

It doesn't matter, if the BOM is available in the file or not.

您也可以将编码选项与其他命令一起使用:

You may also use the encoding option with other commands:

text_without_bom = File.readlines(@filename, "r:utf-8")

(你得到一个包含所有行的数组).

(You get an array with all lines).

或使用 CSV:

require 'csv'
CSV.open(@filename, 'r:bom|utf-8'){|csv|
  csv.each{ |row| p row }
}

这篇关于读取文件时如何避免被 UTF-8 BOM 绊倒的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆