读取具有不同编码的Rdata文件 [英] Reading Rdata file with different encoding

查看:359
本文介绍了读取具有不同编码的Rdata文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.RData文件在我的Linux(UTF-8)机器上阅读,但我知道这个文件是在拉丁文,因为我自己在Windows上创建它。不幸的是,我无法访问原始文件或Windows机器,我需要在Linux机器上阅读这些文件。



要读取一个Rdata文件,正常的程序是运行 load(file.Rdata)。诸如 read.csv 的函数有一个编码参数,您可以用它来解决这些问题,但 load 没有这样的东西。如果我尝试 load(file.Rdata,encoding = latin1),我只是得到这个(预期)错误:




加载错误(file.Rdata,encoding =latin1):
unused参数(encoding =latin1)


< blockquote>

我还能做什么?我的文件加载了包含在UTF-8环境中打开时损坏的重音符的文本变量。

解决方案

感谢42的评论,我设法编写一个函数来重新编码文件:

  fix.encoding<  -  function(df,originalEncoding = latin1){
numCols< - ncol(df)
for(col in 1:numCols)Encoding(df [,col])< - originalEncoding
return(df)
}

这里的肉是命令 Encoding(df [ col])< - latin1,它将数据框 df 的列 col 并将其转换为latin1格式。不幸的是, Encoding 仅将列对象作为输入,因此我必须创建一个函数来扫描数据框对象的所有列并应用转换。



当然,如果你的问题只在几列中,那么最好只是将 Encoding 应用于这些列,而不是整个数据帧(您可以修改上面的功能,以一组列作为输入)。另外,如果您面临逆向问题,即将在Linux或Mac OS中创建的R对象读入Windows,则应使用 originalEncoding =UTF-8。 / p>

I have an .RData file to read on my Linux (UTF-8) machine, but I know the file is in Latin1 because I've created them myself on Windows. Unfortunately, I don't have access to the original files or a Windows machine and I need to read those files on my Linux machine.

To read an Rdata file, the normal procedure is to run load("file.Rdata"). Functions such as read.csv have an encoding argument that you can use to solve those kind of issues, but load has no such thing. If I try load("file.Rdata", encoding = latin1), I just get this (expected) error:

Error in load("file.Rdata", encoding = "latin1") : unused argument (encoding = "latin1")

What else can I do? My files are loaded with text variables containing accents that get corrupted when opened in an UTF-8 environment.

解决方案

Thanks to 42's comment, I've managed to write a function to recode the file:

fix.encoding <- function(df, originalEncoding = "latin1") {
  numCols <- ncol(df)
  for (col in 1:numCols) Encoding(df[, col]) <- originalEncoding
  return(df)
}

The meat here is the command Encoding(df[, col]) <- "latin1", which takes column col of dataframe df and converts it to latin1 format. Unfortunately, Encoding only takes column objects as input, so I had to create a function to sweep all columns of a dataframe object and apply the transformation.

Of course, if your problem is in just a couple of columns, you're better off just applying the Encoding to those columns instead of the whole dataframe (you can modify the function above to take a set of columns as input). Also, if you're facing the inverse problem, i.e. reading an R object created in Linux or Mac OS into Windows, you should use originalEncoding = "UTF-8".

这篇关于读取具有不同编码的Rdata文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆