使用BOM读取UTF-8文本文件 [英] Read a UTF-8 text file with BOM

查看:792
本文介绍了使用BOM读取UTF-8文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有字节顺序标记(U + FEFF)的文本文件。我试图读取R中的文件。是否可以避免字节顺序标记?

I have a text file with Byte order mark (U+FEFF) at the beginning. I am trying to read the file in R. Is it possible to avoid the Byte order mark?

函数 fread data.table 包)读取该文件,但在第一个开头添加ļ»æ变量名称:

The function fread (from the data.table package) reads the file, but adds ļ»æ at the beginning of the first variable name:

> names(frame_pers)[1]
[1] "ļ»æreg_date"

read.csv 函数。

目前我已经创建了一个函数, ,但我相信应该有一个方法如何自动删除BOM。

Currently I have made a function which removes the BOM from the first column name, but I believe there should be a way how to automatically strip the BOM.

remove.BOM <- function(x) setnames(x, 1, substring(names(x)[1], 4))

> names(frame_pers)[1]
[1] "ļ»æreg_date"
> remove.BOM(frame_pers)
> names(frame_pers)[1]
[1] "reg_date"

R会话的本地编码:

> options("encoding" = "")
> options("encoding")
$encoding
[1] ""


推荐答案

您是否尝试过 read.csv(...,fileEncoding =UTF-8-BOM)?file 说:


从R 3.0.0开始编码' -8-BOM',并删除
a字节顺序标记(如果存在的话,它通常是由Microsoft应用程序生成的文件和网页
)。

As from R 3.0.0 the encoding ‘"UTF-8-BOM"’ is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).

这篇关于使用BOM读取UTF-8文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆