摆脱 SAS 和 R 之间的 BOM [英] Getting rid of BOM between SAS and R

查看：40 发布时间：2021/7/14 20:38:14 r sas byte-order-mark

本文介绍了摆脱 SAS 和 R 之间的 BOM的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用 SAS 在 Windows 机器上保存了一个带有 utf8 编码的制表符分隔的文本文件.然后我尝试在 R 中打开它:

I used SAS to save a tab-delimited text file with utf8 encoding on a windows machine. Then I tried to open this in R:

read.table(myfile, header =TRUE, sep = "\t")

令我惊讶的是，数据完全混乱，但只是偷偷摸摸.数值随机变化，但整体布局看起来正常，所以我花了一段时间才注意到这个问题，我现在假设是物料清单.

To my surprise, the data was totally messed up, but only in a sneaky way. Number values changed randomly, but the overall layout looked normal, so it took me a while to notice the problem, which I'm assuming now is the BOM.

这当然不是新问题；他们在这里简要地解决了这个问题，并推荐使用

This is not a new issue of course; they address it briefly here, and recommend using

read.table(myfile, fileEncoding = "UTF-8", header =TRUE, sep = "\t")

然而，这并没有改善！我唯一的解决方案是抑制标题，有或没有 fileEncoding 参数:

However, this made no improvement! My only solution was to suppress the header, with or without the fileEncoding argument:

read.table(myfile, fileEncoding = "UTF-8", header =FALSE, sep = "\t")
read.table(myfile, header =FALSE, sep = "\t")

在任何一种情况下，我都必须做一些有趣的事情来用第一行替换列名，但只有在我删除出现在第一列名开头的某个版本的 BOM 之后(<U+FEFF> 如果我使用 fileEncoding 和ï»¿ 如果我不使用 fileEncoding).

In either case, I have to do some funny business to replace the column names with the first row, but only after I remove some version of the BOM that appears at the beginning of the first column name (<U+FEFF> if I use fileEncoding and ï»¿ if I don't use fileEncoding).

难道没有一种简单的方法可以删除 BOM 并使用 read.table 而无需任何特殊参数吗?

Isn't there a simple way to just remove the BOM and use read.table without any special arguments?

@Joe 的更新:我使用的 SAS:

FILENAME myfile 'C:\Documents ... file.txt'  encoding="utf-8";
proc export data=lib.sastable
  outfile=myfile
  dbms=tab  replace;
  putnames=yes;
run;

关于进一步奇怪的更新: 使用 fileEncoding="UTF-8-BOM" 作为@Joe 在下面的解决方案中建议的似乎删除了 BOM.然而，它并没有解决我最初的激励问题，即数据损坏；标题行很好，但奇怪的是第一列数字的最后几位数字被弄乱了.我会感谢 Joe 的回答——也许我的问题实际上不是 BOM 问题?

Update on further weirdness: Using fileEncoding="UTF-8-BOM" as @Joe suggested in his solution below seems to remove the BOM. However, it did not fix my original motivating problem, which is corruption in the data; the header row is fine, but weirdly the last few digits of the first column of numbers gets messed up. I'll give Joe credit for his answer -- maybe my problem is not actually a BOM issue?

Hack 解决方案: 使用 fileEncoding="UTF-8-BOM" 并且还包括参数 colClasses = "character".不知道为什么这可以解决数据损坏问题 - 可能是未来问题的主题.

Hack solution: Use fileEncoding="UTF-8-BOM" AND also include the argument colClasses = "character". No idea why this works to fix the data corruption issue -- could be the topic of a future question.

摆脱 SAS 和 R 之间的 BOM [英] Getting rid of BOM between SAS and R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

摆脱 SAS 和 R 之间的 BOM [英] Getting rid of BOM between SAS and R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭