读取不同编码的 Rdata 文件 [英] Reading Rdata file with different encoding

查看:22
本文介绍了读取不同编码的 Rdata 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 .RData 文件要在我的 Linux (UTF-8) 机器上读取,但我知道该文件是 Latin1,因为我是在 Windows 上自己创建的.不幸的是,我无法访问原始文件或 Windows 机器,我需要在我的 Linux 机器上读取这些文件.

I have an .RData file to read on my Linux (UTF-8) machine, but I know the file is in Latin1 because I've created them myself on Windows. Unfortunately, I don't have access to the original files or a Windows machine and I need to read those files on my Linux machine.

要读取 Rdata 文件,通常的程序是运行 load("file.Rdata").read.csv 之类的函数有一个 encoding 参数,您可以使用它来解决这类问题,但 load 没有这样的东西.如果我尝试 load("file.Rdata", encoding = latin1),我只会得到这个(预期的)错误:

To read an Rdata file, the normal procedure is to run load("file.Rdata"). Functions such as read.csv have an encoding argument that you can use to solve those kind of issues, but load has no such thing. If I try load("file.Rdata", encoding = latin1), I just get this (expected) error:

加载错误("file.Rdata", encoding = "latin1") :未使用的参数(编码 = "latin1")

Error in load("file.Rdata", encoding = "latin1") : unused argument (encoding = "latin1")

我还能做什么?我的文件加载了包含在 UTF-8 环境中打开时会损坏的重音的文本变量.

What else can I do? My files are loaded with text variables containing accents that get corrupted when opened in an UTF-8 environment.

推荐答案

感谢 42 的评论,我已经设法编写了一个函数来重新编码文件:

Thanks to 42's comment, I've managed to write a function to recode the file:

fix.encoding <- function(df, originalEncoding = "latin1") {
  numCols <- ncol(df)
  for (col in 1:numCols) Encoding(df[, col]) <- originalEncoding
  return(df)
}

这里的重点是命令Encoding(df[, col]) <- "latin1",它需要数据帧df的列colcode> 并将其转换为 latin1 格式.不幸的是,Encoding 仅将列对象作为输入,因此我必须创建一个函数来扫描数据帧对象的所有列并应用转换.

The meat here is the command Encoding(df[, col]) <- "latin1", which takes column col of dataframe df and converts it to latin1 format. Unfortunately, Encoding only takes column objects as input, so I had to create a function to sweep all columns of a dataframe object and apply the transformation.

当然,如果您的问题仅在几列中,您最好将 Encoding 应用于这些列而不是整个数据框(您可以修改上面的函数以采用一组列作为输入).此外,如果您面临相反的问题,即将在 Linux 或 Mac OS 中创建的 R 对象读入 Windows,您应该使用 originalEncoding = "UTF-8".

Of course, if your problem is in just a couple of columns, you're better off just applying the Encoding to those columns instead of the whole dataframe (you can modify the function above to take a set of columns as input). Also, if you're facing the inverse problem, i.e. reading an R object created in Linux or Mac OS into Windows, you should use originalEncoding = "UTF-8".

这篇关于读取不同编码的 Rdata 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆