使用 BOM 读取 UTF-8 文本文件 [英] Read a UTF-8 text file with BOM
问题描述
我有一个以字节顺序标记 (U+FEFF) 开头的文本文件.我正在尝试在 R 中读取文件.是否可以避免字节顺序标记?
I have a text file with Byte order mark (U+FEFF) at the beginning. I am trying to read the file in R. Is it possible to avoid the Byte order mark?
函数fread
(来自data.table
包)读取文件,但在第一个开头添加ļ»æ
变量名:
The function fread
(from the data.table
package) reads the file, but adds ļ»æ
at the beginning of the first variable name:
> names(frame_pers)[1]
[1] "ļ»æreg_date"
read.csv
函数也是如此.
目前我已经做了一个从第一列名称中删除 BOM 的函数,但我相信应该有一种方法可以自动去除 BOM.
Currently I have made a function which removes the BOM from the first column name, but I believe there should be a way how to automatically strip the BOM.
remove.BOM <- function(x) setnames(x, 1, substring(names(x)[1], 4))
> names(frame_pers)[1]
[1] "ļ»æreg_date"
> remove.BOM(frame_pers)
> names(frame_pers)[1]
[1] "reg_date"
我正在为 R 会话使用本机编码:
I am using the native encoding for the R session:
> options("encoding" = "")
> options("encoding")
$encoding
[1] ""
推荐答案
你试过 read.csv(..., fileEncoding = "UTF-8-BOM")
吗??file
说:
从 R 3.0.0 开始,编码UTF-8-BOM"被接受并将删除字节顺序标记(如果存在)(通常用于文件和网页由 Microsoft 应用程序生成).
As from R 3.0.0 the encoding ‘"UTF-8-BOM"’ is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).
这篇关于使用 BOM 读取 UTF-8 文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!