如何在阅读文件时避免绊倒UTF-8 BOM [英] How to avoid tripping over UTF-8 BOM when reading files

查看：140 发布时间：2017/10/26 20:30:11 ruby file unicode byte-order-mark

本文介绍了如何在阅读文件时避免绊倒UTF-8 BOM的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用最近添加了一个Unicode BOM标头（U + FEFF）的数据Feed，而我的耙子任务现在被它弄乱了。

I'm consuming a data feed that has recently added a Unicode BOM header (U+FEFF), and my rake task is now messed up by it.

我可以用 file.gets [3 ..- 1] 跳过前3个字节，但是有一种更优雅的方式来读取Ruby中的文件，可以正确处理这些文件，无论是一个BOM是否存在？

I can skip the first 3 bytes with file.gets[3..-1] but is there a more elegant way to read files in Ruby which can handle this correctly, whether a BOM is present or not?

推荐答案

使用ruby 1.9.2可以使用模式 r： bom | utf-8

With ruby 1.9.2 you can use the mode r:bom|utf-8

text_without_bom = nil #define the variable outside the block to keep the data
File.open('file.txt', "r:bom|utf-8"){|file|
  text_without_bom = file.read
}

或

text_without_bom = File.read('file.txt', encoding: 'bom|utf-8')

或

text_without_bom = File.read('file.txt', mode: 'r:bom|utf-8')

如果物料清单在文件中可用，则不需要。

It doesn't matter, if the BOM is available in the file or not.

您还可以使用其他命令的encoding选项：

You may also use the encoding option with other commands:

text_without_bom = File.readlines(@filename, "r:utf-8")

（你得到一个包含所有行的数组）

(You get an array with all lines).

或使用CSV：

require 'csv'
CSV.open(@filename, 'r:bom|utf-8'){|csv|
  csv.each{ |row| p row }
}

这篇关于如何在阅读文件时避免绊倒UTF-8 BOM的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在阅读文件时避免绊倒UTF-8 BOM [英] How to avoid tripping over UTF-8 BOM when reading files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在阅读文件时避免绊倒UTF-8 BOM [英] How to avoid tripping over UTF-8 BOM when reading files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭